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ABSTRACT 


liicrocoded systems ere used extensively in signal 
processing applications. They attain high functional parallelism and 
thereby achieve high throughput .Simulat ion of a system is a software 
tool widely used for the system development and checking. In this 
thesisr a number of microcoded signal processors are proposed and 
a simulator for these systems is developed. The simulator permits 
programs to be finely-tuned in software prior to making significant 
effort to bring up the target system in hardware. The simulator is 
interactive, it allows display of registers and memory contents^ 
Single and full-speed run is possible. Using this simulator 
performance of the proposed microcoded systems is evaluated with a fei 


DSP algorithms 



CIlAPTCR-1 


INTRODUCTION 


1.1 Digital Signal Processing CDSPl : 

Digital signal processing is concerned with representation of 
signals by sequence of numbers and processing these sequences. The 
purpose of such processing may be to estimate characterst i c parameters 
of a signal or to transform a signal into a form which is, in some 
sense, more desirable than the original. Signal processing finds wide 
applications in various diverse fields such as biomedical engineering, 
acoustics, sonar, radar, seismology data and speech communication, 
nuclear sciences and many other areas. 

1.2 Approaches to Implementation of Signal Processing Algorithms 

Efficient implementation of DSP algorithms require high speed 
algebraic manipulation of data involving mainly multiplication and 
addition. The computational requirements of DSP application range from 
a few hundred operations per second to several hundreds of millions of 
operations per second<Mops> . A single, fixed implementation scheme 
will clearly, be non-optimal vis-a-vis cost and performance index. 
Besides cost and speed, another important criteria is the acuracy of 
implementation. Depending on the particular application or class of 
application, tradeoff exists between; 



speed 


— accuracy 

— cost < both hardware and programming) 

Ulhile software implementation on a general purpose micro 
processor is often cost effective in non real-time and/or low data 
rate applications, this approach soon proves to be inadequate as the 
amount of data and/or the data-rate increases, and a hardware solution 
becomes a necessity. 

The various hardware solutions can be broadly categorized into 
the following types: 

— General purpose micro processor with coprocessor, 

— Programmable DSP microprocessor, 

— Mi croprogrammable Processors. 

The first approach is a modest extension of the conventional 
microprocessor, and is satisfactory for many low-to-moderate 
data-rate applications. 

Programmable single chip digital signal processor is a 
microprocessor whose architecture is optimised to process sampled data 
at high rates. These architecture continue to be adequate for many 
medium to moderately high throughput applications. Existing DSP micros 
like TMS520. ADSP2100. DSP56000. WEDSP16 & TS68950 etc Cspectrum 87] 
belong to this category. 

The third approach is the most efficient, but expensive 
solution for throughput requirements in DSP. The processors are 
implemented using multiple hardware funtional units under microprogram 
control, which allows the designer increased control over the system 



architecture to meet very high throughput requirements. These 
architectures are generally called as roicrocoded systems. 

1*3 Mlcrocoded Signal Processors CMCSPl 

A microcoded system employs building block IC's to construct 
a signal processor. A comprehensive set of building blocks are 
available to enable high performance digital signal processors to be 
constructed. 

The basic difference in the design philosophy between 
microprocessor circuits and microcoded circuits is that the functions 
which are integrated on to a single microprocessor device are 
partitioned as seperate devices in a microcoded circuit. Microcoded 
circuits offers a high degree of functional parallelism where many 
operations can be performed simulteneously unlike in a conventional 
microprocessor thereby increasing the throughput. 

Since each device in a microcoded system can operate 
independently for a synchronous operation, a proper coordination is 
needed. This is achieved by having a single instruction which is 
synchronous to a system clock. Each instruction is called a 
microinstruction. Each microinstruction contains the control bits for 
the various devices in the system at different locations called 
'fields'. That is, each field direclty controls the corresponding 
device of a microcoded system. A sequence of microinstructions is 
called a microprogram. Microprograms are written and stored in a 
memory called control store<microcode memory). 

During each system clock cycle, a microcode memory location 
is accessed and the microcode residing at that location is supplied to 



the microcoded devices in the system. The width of the microcode 
memory depends on the number of devices in the system. The power of 
the microcoded processor lies in the fact that during each clock 
cycle, each component can execute an instruction, allowing the 
system to attain high throughput. Such a processor is invariably 
attached to a host computer. One such attached processor is shown in 
Fig 1.1. Host machine could be a personal computer, mini or a 
main computer. 

1.4. tMbJect.lvos and Scope of the Thesis 

In this thesis an attempt has been made to develop a simulator 
for a MCSP system for developing and executing various signal 
processing algorithms. The objectives of the current work are 
described below: 

<i> To define various MCSP architectures based on performance 
and speed which will work as an attached processor to an external 
host . 

<ii> To develop a simulator for the architectures using 
ADSP-14** and ll** chip set. 

<iii> To test the simulator through execution of some simple 


mi croprograms . 




Fig 1.1 HOST attached Microcoded System 












1.5 Organisation of tha Thesis 


In Chapter Z various MCSP architectures are proposed and the 
various functional elements used are briefly described. 

The working of the simulator for the three MCSP 
architectures is explained in Chapter ?. 

In Chapter 4 algorithms like 10-convolution and matrix 
multiplication are tested on the MCSP systems and their performance is 
evaluated. The meta-assembler used is also explained. 

Finally in Chapter 5. we conclude the thesis with suggestions for 


further work. 



CHAPTER 2 


mCROCOIXED SXG94AL PROCESSORS t AN OVERVIEW 


In this Chapter various microcoded architectures arc 
proposed, and discussed. The systems are simulated on HP-9000 and 
tested with test algorithms. 


2.1 system 1 t Uniprocessor with LOCAL MEMORy 


The block-diagram of this system is shown in Fig 2.1. The 

system can be devided into three main sections : 

<i> Control section, 

<ii> Address generation section , 
and <iii> Number crunching section . 

2*1.1 Control Section : 

The main components of the control section are the program 
sequencer , the microcoded memory and the pipeline register. 


The Progrcm, SeqvjenceT 

The microprogram sequencer’s main task is to provide 
the appropriate microprogram addressing to support programming 
requirements ( e.g,, looping , jumping , branching subroutines 

conditional testing and interrupts ). During each micro-instruction 
the program sequencer monitors the conditions and instruction to 
determine the address of the next microinstruction thereby controlling 




Fig 2.1 SYSTEM 1 > Syst-em with Uniprocessor and Local Memory. 










the program flow . The microprogram memory which contains the 
microinstructions, may be a RAM or a ROM . The program communication 
between the microprogram memory and program sequencer is shown in 
Fig 2.2. 



Fig 2.2 ; Genetation of Next Address 

In a complex high performance system the instruction in the 
microcode memory will not be fetched sequentially as previously 
described. In most cases, sophisticated program flow will be needed 
As with other devices in the system , the program sequencer will 
receive its instruction from microcoded memory as explain earlier 
In essence , the microcoded memory is instructing the program squencer 
to get the next instruction . Fig 2.3 shows an example of a 
non -sequent ial program flow. Each instruction gives the sequencer the 
information on how to get the next instruction to be executed. 
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Fig 2.^ i Non -sequent i al Program Flow 

These jumps and conditional calls can also be made 
conditional upon some external event . The program sequencer generates 
address of the instruction which determines program flow in several 
different ways , three of which are shown Fig 2.4 . 





Fig 2. 4 I Examples of Program Flow Charts 

Program sequencer removes overhead by Keeping track of 
program sequencer incrementing, soubroutine handling and 

and outside-world inter rupts. The sequencer thus has the following 
main functions: 

i) handle normal program flow, incrementing the program counter 


lO 







by one in each cycle 


2) keep track of subroutine addressing and manage the return 

address stack, 

3) manage loops in an overhead-free manner controlled by on-chip 
loop counters, 

4) jump to appropriate exception handler routines when data 

overflow is detected, 

5) service interrupts from external I/O devices, jumping to 

appropriate interrupt service routines. 

In order to accomplish these functions a modern program 
sequencer includes the folloowing components ! 

— Program counter 

— conditional logic tester 

— stack for 

— subroutine and interrupt return addresses, 

— counter values 

— jump addresses 

— loop counters 

Pipolino loLtch&s 

Microcode memories are constructed with either ROM or RAM 
components . Depending on the depth and width of microcodememory 
needed , many memory chips will need to be cascaded .The accessing 
of a memory involves an address being stable , the location contents 
being accessed and data output becoming stable. Between memory accesses 



as the address and data become stable , the output of the memory 
is in an undetermined state . The consequences of having these 
undetermined bits driving the devices of the microcode circuit 
is that unpredictable circuit operation is imminent . To 
prevent such operation , pipeline latches are required . A D-tyoe 
latch is placed at the output of the microcode memory and is clocked 
at every rising edge of the system clock as shown in Fig 2.5, 



Fig 2.5 ! Pipeline Latches 


The timing of the microcode memory is such that valid output 
exists prior to the rising edge of the clock and stable information is 
loaded into the latch . It will hold this microcode instruction 
until the next clock edge . During this time a new memory location 
can be accessed . The process of holding a valid instruction 
while the next is being fetched is called pipelining , hence the 
name pipeline latch . 

The functional unit used as control unit is ADSP-1401 .The 
ADSP-1401 is a high speed microprogram sequencer optimised for 
demanding sequencing tasks found in DSP and general purpose computers. 





ts features such as on chip storage and control of ten prioritised 
nd maskable interrupts , four decrementing event counters, absolute 
elative and indirect addressing capability and a dynamically 
:onfigurable 64 word RAM ideally meet the sequencing demands of 
any microcode circuits The ADSP~1401 contains an internal pipeline 
register and canbe connected directly to the output of the microcode 
memory . No externl pipeline latch is required as in the other 
devices. A detailed description of ADSP-1401 is given in Appendix A. 

2.1.2 Addtr&ss g&n^ra,tion Section 

Addressing complexity for signal processors can range from 
integer counting to the more complex sequence of data pointing 
occur ing in FFTs. In signal procesing there are more memory accesses 
than there are data points, but the memory access sequence although 
long ,is very well structured and makes possible the design and the 
use of dedicated address generators. The FFT is a good example of an 
algorithm that can use an address generator very effectively. The use 
of address generator relieves the CPU from the mundane task of address 
generation, and permits the CPU to use all its machine cycles for 
arithmetic computat ions, thus boosting the overall system performance. 

Desirable function of an address generators includes 

1) send a precomputed address to data memory, 

2) modify the address by an offset value, 

3) perform logical operations and shifts, 

4) compare a pointer to a preset value, 

5) reset pointer to buffer origin, 

6) reverse the order of bits (needed in FFT) 

In the general purpose system explained here, a 16-bit data 



bus structure is used. Data operands must reside in a data memory and 
all data memories and buses are 16-bit wide. The length of data memory 
is determined by maximum size of any data record that must be stored- 
Here a IK word memory is chosen. But this length can be increased as 
desired using architecture definiton file. 

The functional unit, chosen here , which includes all the 
above features of address generation is ADSP-1410. The ADSP-1410 is a 
fast, flexible address generator which rapidly generates the data 
memory addresses required for a wide range of digital signal/array 
procesors and other high performance computers. The ADSP-1410'^s 
architecture features a 16-bit ALU, a comparator and a 30 16-bit 
registers. In a single cycle, the device can output a 16-bit memory 
address, modify this memory address, and detect when the address value 
has moved to or beyond a pre-set boundary and conditionally loop back 
to the top of a circular buffer. Consequently, circular buffers and 
modulo addressing for data memories can be implemented without 
overhead. 

The ADSP-1410 contains an internal microcode pipeline latch 
and can be connected directly to the output of the microcode memory. 
No external pipeline latch is required. A detailed desription of 
ADSP-1410 chip and it^'s working is given in Appendix B. 

Data Buff&T 

If the data values are to be specified directly from the 
microprogram instruction, a constant data field is included in the 
instruction field. Since the data field cannot drive the data bus 
directly, it is loaded into a data buffer. In order to control the 
data buffer one control bit is included in the microprogram 



instruction field. Whenever it is high, buffer is enabled and data bus 
is loaded with constant data field of the micr ioprogram instruction. 
This is very important while initialising various functional units of 
microcoded systsem. 

2.1.3 Mumberr Crunching Section 

In order to perform high speed arithmetic computations, an 
appropriate arithmgtic device or devices are to be used. The device 
should have the desired port structure, high computation speed and 
overall features. In the present work ADSP-lllOA is chosen for the 
purpose. 

ADSP-lllOA is a high speed, low power, single-port 16xl6-bit 
multiplier /accumulator (MAC) offering unique advantages such as 
comapact 28-pin DIP package, simple system interface to single bus 
peripherals, reduced cost and featues such as overflow and saturation. 
This device will perform multiplications, additions and subtractions 
which are the core operations in most algorithms.lt offers mixed mode 
multipliers such as multiply/subtact. The chip details and its working 
are given in Appendix C. Since ADSP-lllOA has no internal pipeline 
latch, an external latch is used to drive the multiplier/accumulator 
instruction bits from microcode instruction memory. 


74LS374 is an output pipeline latch used to latch the output 
data whenever ADSP-lllOA executes an output instruction. One control 
bit is included in the microinstruction to control latch 
operations. Its operation is shown in Fig 2.6. 
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Fig 2.6 Functional Table of 74LS374 

BidiTOctional Bu//&t 

The MAC-unit is interfaced to the host data bus through a 
bidirectional buffer (74LS245). This is to avoid bus-contention when 
more than one MAC unit is connected to the data-bus(see Fig 2-9 & 

2.10). Two control bits are provided in the microinstruction to 
control this functional device. It^s operation is shown in Fig 2.7, 


control bits 
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Fig 2. 7 * Control of Bidirectional Buffer 
The functional table of 74LS245 is shown in Fig 2.8, 
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Fig 2.8 * Functional Table of 74LS245 





2. 1 . 4 MicTocodo Memory Width 


The number of control bits required by the various 
functional units of the system of Fig 2.1 are listed below. 


11 Program Sequencer 7 

2) Data Field 16 

3) Data buffer contro 1 

4) Data memory control 2 

5) Address Generator Field IX 

6) MAC field 8 

7) MAC o/p latch control 1 

8) Bidirect ional buffer 2 


4 *? 


Thus 4 ' microcode bits are required to irive the system 
components. 


2. 2 SYSTEM 2* System with Two Processors and A Shared Ifomory 


One of the advantages of a microcoded systems is that many 
arithmetic devices can be organised to function in parallel to 
increase the throughput of the system. Fig 2.9 shows an architecture 
having two MAC units sharing one memory unit via an internal data 
bus. This internal data bus is sep4rated from the host data bus by a 
bidirectional buffer 74LS245. The MACs in tern are connected to the 



HOST BUS 


Fig 2.9 SYSTEM 2 t System with Two Processors and A Shared Memory 










internal data-bus via tri -state buffer (74LS245). This is to ensure 


that both liACs do not try to write data-memory simultaneously. The 
incluson of the second MAC with its glue-logic increases the microcode 
width to = 6 0 ( MAC #2 needs 8, MAC o/p latch needs 1 and 
tri -state buffer needs 2 ). 


2.3 SYSTEM 3 t S 3 rt.sem wlt.h Two Processors and Two Memories 


In System 2 both the processors cannot access different data 
items in the memory simultaneously, because of common-bus structure, 
one of them must wait till the other has accessed the data. This 
seriously limits the throughput of the system even though the system 
has two processors. The architecture of Fig 2.10 attempts to overcome 
this limitation by providing separate data memories for the two MAC 
units. This architecture is essentially a duplication of the unit of 
Fig 2.1, with the two units now connected to host data bus via 
tri -state bidirectional buffers. The microcode width now becomes 74 
bits. 

In the next chapter the simulation of the architecture 
described here discussed and in chapter 4 their performance is 
evaluated with a few examples. 



HOST BUS 


Fig 2.10 SYSTEM 3 t System with Two Processors and Two Memories 













CHAPTER 3 


SIMULATOR 


3. 1 Introduction 

A simulator is a software tool which aids the user in program 
development and checking. The Simulator permits programs to be 
fine-tuned in software prior to making the significant effort to bring 
up the target system in hardware. The Simulator acts as much like the 
target system as the nature of software permits. Programs under 
development are run on a host machine. For simulating input/output , 
physical I/O devices are replaced by host memory locations. 
Breakpoints can readily be set so that when a program arrives at a 
specified program or data location, there is a display of all 
register contents when the break point occured or Trace of system 
activity before the breakpoint. Sections that cause a crash or 
dead-ended loops can readily be identified. The host^s memory plays 
the role of memory in the target system, acting as ROM when so 
specified. Even interrupts can be simulated by timers set to trigger a 
simulated interrupt line. 

3.2 Simulator Blocks 


The main building blocks of the simulator are the simulation of 



(i) sequencer (ADSP 1401) 

(ii) address generator (ADSP 1410) 


(iii) arithmetic unit (ADSP lllOA) 


For simulation of the seuqencer and the address 
3l lowed the approach given by WANDHEKAR Z21. In what 
ive a brief descripton of the simulation methodology 


generator we 
follows, we 
for these units. 


1.2.1 Simxilation of th& Soq'kju&tijcot CADSP-l 401) 

g 

The details of the chip are given in the Apjmdix -A. All the 
features of the chip are simulated. Each of the sequencer instruction 
is simulated using a separate routine. Registers and signals are 
expressed in the form of global variables. The simulation process is 
as follows. 

In an instruction cycle, first the subroutine for the instruction 
corresponding to the opcode presented to the sequencer is executed. 

Then the program checks for an interrupt request. If a request is 

pending the program counter is loaded with interrupt vector. At the 
end, the program checks the interrupt pins and interrupt sources for 

an interrupt. If any of these sources is active a request is raised. 

The working of the sequencer is as shown in the flow chart of Fig 3.1. 

The ADSP-1401 processes eight external and ' two internal 
interrupts. The two internal interrupts are reserved for stack 
overf low( IR9) and counter under f low ( IRQ) . The simulator has internal 
cycle counters, which can be set to model interrupt devices. User can 




SimulalioTi of the Sequencer 


ADSP-1401. 


Fig 3-1 



activate any of the eight external interrupting sources (IR1-IR8) by 
specifying time interval in cycles between two consecutive interrupts. 
When internal time expires, an interrupt is issued at the 
corresponding level. To disable the interrupt, the interrupt period is 
set to zero. 

The other major features of the sequencer simulation are , 

Ci) Internal 64-word RAM implementing two distinct stacks, a subroutine 
stack and a register stack, when stack overflow is detected, the 
interrupt IR9 is raised. 

(ii> Four independent counters are used for maintaining loops and 
event tracking. 

(iii) When the sign bit of the status register is set, IRQ is raised. 


3.2.2 Simulation of Addross Conorator- lADSP-l^lOl 

The simulation process is similar to the sequencer. In every 
cycle the subroutine for the instruction corresponding to the opcode 
for address generator is executed. The instruction set and details of 
address generator chip are given in Appendix-B. 


3.2.3 Simulation of thd Arithmotic unit [ADSP-lllOAl 

The simulation of Arthmetic Unit is also similar to sequencer and 
address generator. The instruction set and details of the chip are 
given in Appondix-C. 



3.3 Th» Working of The Siinlulaior 


As shown in Fig 3.2., the simulator reads the architecture 
description file, object code file and symbol table file output by 
the meta-assemb ler . An architecture description file is input to 
simulator to start simulator session. The object code file is loaded 
interactively . The symbol table file is loaded implicitly when the 
object file is loaded. The definition file is loaded implicitly when 
the simulator is run. 

By reading the debug symbol table (created by the 
meta-assembler ) , the simulator interacts with the user symbolically. 
The user can make references to the variables and program labels using 
symbols defined in the user program avoiding the need to decode the 
symbols. The simulator can disassemble the microinstruction, making 
full use of the symbols defined in the user program. 

The simulator is interactive. By reading the architecture file, 
it configures itself with the target system architecture- The user can 
download the data memory from the terminal. The user can also upload 
the data memory contents into a file for subsequent analysis. The user 
can also observe the functioning of various components of the system 
by displaying registers/memory contents of sequencer, address 
generator, arthmetic unit, microprogram memory and the data memory. 




Fig 3.2 Simulator inputs and outputs. 


A user working in HP-9000, should first compile the simulator 
and create a .out file. A simulator session is started by executing 
the .but file. When the user is working on PC, he should first 
compile the simulator in Turbo-C and create an executable file for the 
simulator. By typing MSIM <> user starts a simulator session. 

The main operation of the simulator is shown in Fig 3.3. 

As explained earlier the inputs to the simulator are the 
architecture description file and the definition file. These files are 


explained below. 





Fig 3.3 The Main prograM floM of the Simulator. 







Architecture Description File 


A valid architecture description file starts with 'Q' and ends 
with . In the architecture description file user can specify the 
type of architecture he needs, i.e. he can specify the length of the 
data memory and program memory. If any one of the above parameters is 
not specified, a minimum default value is assumed. For example if 
user wants the following specification, 

D-mcm length 2000 
p-mem length 3000 

he has to input the above specifications as follows: 

@ 

dmem 2000 
pmem 3000 
@ 

Definition Files 


The simulator assumes the microinstruction as a set of fields. 
The information about each field, i.e. field number, length of field, 
beginning and end of each field etc. is read by the simulator from 
definition files. These files are created by the meta-assembler . 



3* 4 Simulator Commands 


After collecting architecture description file and definition 
file, the simulator allocates memory to various components of the 
system, i.e. sequencer, address generator, data memory, program memory 
etc. 


The simulator then prints the simulator prompt, MSIM>, and waits 
for commands. 

The simulator executes various commands present in a command 
table. When a command is given, it searches the command table and 
executes that particular command. The user can find the various 
commands used by the simulator, by means of help command. 


3. 4. 1 LoadLing Data Momory and Program Momory 

In order to execute a microprogram, the program memory of the 
simulator is to be loaded with the microprogram. This is done by 
executing the command Icode when the simulator prompts MSIM>- The 
simulator then asks for the object microprogram and the user should 
provide object file created by met a -assembler. 

The data memory can be loaded by executing the command Idm. 

For example: 

To load the simulator program memory with microcode, the command is 



MSIM>lcode <cr> 


-f ilesmicl.obj <cr> 

To load the data memory with integer type values, the command is 
MSIM>ldm <cr> 

-Address: 0 <cr> 
length :5 <cr> 

1 

f2h 

4o 

1011b 
200 <cr> 

Thus a binary, octal, decimal and hexa-decimal inputs can be given. 

3.4.2 Int&rrupts 

The simulator has internal cycle counter to model interrupting 
devices. The user can activate any one of the eight interrupts, 
specifying the number and period between two successive interrupts. 
After this period, the program counter value of sequencer is replaced 
with the corresponding interrupt vector. At startup of the simulator, 
all interrupts are disabled. To enable the interrupt, setint 
command is given as shown below : 

MSIM>setint <cr> 


-numbers 2 <cr> 



-period (in cycles): 25 <cr> 


To disable the interrupt, the period should be zero. 

3. -4. 3 Rurm.irtg th& Simulator 

After supplying the simulator with all necessary requirements, 
such as loading of program memory and data memory, and setting of 
interrupts, it is time to execute the microprogram. 

The simulator takes the microinstruction in hex code and changes 
it into binary bits and places the microinstruction in an array of 
characters. As explained earlier, each instruction consists of a set 
of fields. Using the structure of each field, the simulator separates 
each field and then executes the operation of the respective field-'s 
function. The various steps followed by the simulator when run 
command is given are shown in Fig 3.4a,»b,c & d (for the system with 
one Arithmetic unit and one RAM). 

The simulator can also execute one microinstruction at a time, 
and halts after completion of that microinstruction (i.e. single 
stepping). to do this, the command to be given is: 


MSIM> ss <cr> 




Fig 3. 4. a Execution of 'run’ Copland 







tia 3.4.I1 Execution of nicroinstruction 









step (b> of 
execution cycle 


1 

Oryonise e^ch 
f ield* s value 
in alphabetical 
order. 


T 

Load data buffer 
Mith data field’s 
value . 

4 

Execute data 
buffer control 
function. 

Execute sequencer 
functicn. 


/ Execute / 
/ bi-directional/ 
/ buffer functiorw 


Fig. 3.4.C Execution of Step (b> 







Fig 3.4.d Continuation of Stop(b> 







3. 4 . 4 Displac^ing R&gyM»m. contents 

The simulator can display 

- data memory 

- program memory 

- sequencer's reg/mem components contents 

- address generator reg contents 

- Arithmetic unit contents 


The various display commands are: 

l.MSIM>ddm <cr> j 10 locations of datamemory are 

displayed on the screen. 


-Address: 

2.MSIM>dpm <cr> ; All the defined program 

memory contents starting from 
the given address are displayed 
on the screen. 


-Address: 

3.MSIM>d_seq <cr> 


4. MSIM>d_agr <cr> 


5. MSIM>d mac <cr> 


6. MSIM>dispint <cr> ; 


displays the contents of all the 
counters, RAM locations, program 
counter, stack pointers, 
displays the contents of all the 
registers of the address generator, 
displays Arithmetic unity's 
contents. 

displays interrupt vector number 


and period 



The data memory contents of the simulator can be dumped into a 


file for further analysis with the command: 
MSIM>dumpdm <cr> 


-address; 

— <cr> 

-length: 

— <cr> 

-file: 

— <cr> 


3. 4.S Ending a. Sim.ulcitox' Session 


User can come out of a simulator session with the command: 
MSIM>€Kit <cr> 

Are you sure: y or Y <cr> 

The user can also come out of the simulator with control C 
option. 


3.5 Si]iiulat.or Coramsinds Summary 

Loading the simulator: 

1dm - loads data memory 

Idlro - loads second data memory 

(This is only for the system with 2 memory 
components). 



Icode -loads the program memory with microprogram 
object file. 

Interrupts: 

setint -sets interrupt labels to sequencer, 
dispint -displays the interrupt vectors. 

Running of simulator: 

run - runs the microprogram until it completes, 
ss - runs single microinstruction and halts, 
setpc - sets the program counter value to required 
location. 

Display : 

ddm *- displays data memory contents- 
ddlm - displays second data memory contents. 

(This is only for the system with 2 memory 
Components). 

dpm - displays all the defined prgram memory contents 
starting from the given address 
d_seq - displays sequencer ''s contents- 
d_agr - displays address generator contents 
d_mac - displays Arithmetic unit contents, 
dumpdm -dumps data memory contents into a file, 
dumpldm-dumps second data memory contents into a 
file. 

cycle - displays a cycle count of simulator. 

help - displays various command listings of the simulator 

exit - exits from the simulator session . 



CHAPTER 4 


TESTING THE SIMULATOR 


In Lhis chnpt.ex' micropronprains for- -two adi(;oz'i'thms,namely^ 

1-0 convolution atod matrix multiplication ,aro developod to test the 
thz'ee proposed miorocoded systems usini; the meta-assembler 
Cdescribed in Section 4.4>,which converts micropro|;rams to machine 
code. 


4.1 System 1: One memory unit and one MAC 


4.1.1 jilforithm 1 :Ha.tx-ix Multiplication 


Consider C ■■ AB , where A is MxN and B is NxK so that C is MxK. 

. For purpose of illustration , let A be C4x4> and B be C4x6>.The 

matrix A Ca 1 is srtored in Data-memory row-wise while matrix 
m,n ^ 

B a stored in same memory column-wise EFis 4.13. 

The ndoroproi^ram shown in Pif; 4.2 performs the followini; 
cxMi^utation : 


DO 1 for m - 0 to CM-1> 
DO 2 for k » 0 to CK-1> 
SUM - 0 

DO 3 for n -> 0 to CN-1> 


s . » SUM + a b . 

m,n n,k 



3 CONTINUE 


2 CONTINUE 
1 CONTINUE 

Table 4.1 shows the microoperations 
on a cycle-by-cycle basis. The major steps 
can be summarised as follows: 

i) Cycle 1-10 involves, 

. initialisation of pointers to memory 
locations corresponding to a ,b , & c , 

nn,n n,Jc m,k 

respectively. 

. setting loop counters for n,k,m. 

. Other bookkeeping operations- 

ii) Actual computation starts in cycle 11. 
cycle 

n ll) . Load a to X~reg 

oo 

12 (2) Load b to Y-regj CX) — ►CXD)? 

OO 

Initiate MAC operation CNote operands 

have to be fed in successive cycles because 

ADSP lllOA has a single I/O port.l 

14 (4> Result (a b +0) is clocked into 

oo oo 

MR-reg and MAC operation continues. 



Fig 4.1 


20 (10) Load zero to Y-reg (SUM = 0); D-m&m contents 

result c — ► (MR-reg) Cinit ialising for next c , 3. 

oo wik 

22 (12) (MR) = c is output to BUS and written into memory 

OO 

C Computation of each c , effectively takes 12 cycles 3 

mk 

iii) Initialising for the next element c , ; 




cycles 



Ini t ial isat ion 
Data = 50h 

Data = lOOh - 

Data = 150h — ♦ 
Data = N 


Table 4.1 
MATRIX Vs MATRIX 

M = 4 N = 4 K = 6 

► CbO))(A6) ; initialise (bo)-reg 

► (il)(AG) ^initialise i^-reg 

(r2 ) CA6) jinitialise r_-reg which as pointer to c 

Z. 


( bDCAG) 


; load bj^- reg with offset = N 


Data = 0 * (LS)tMAC) jinitialise LS-reg of MAC 

Data = 1 LSP of SEQ jinitialise LSP to 1 . 

Data = label4 > RAM Cl) (SEQ) jpush labelA to stack . 

Data = M-2 » (c2) (SEQ) jset cnty c^to M-2 for looping M times 

Data = K-2 *■ (cl)(SEQ) &< (il)(AG) »(rl)(AG) jset cntr c^ t 

K-2 for lopping K times & (il) — * (rl),rl acts as pointer to b 


Data =N-2 ► (cO) (SEQ) 8, (bO) (AG) ►(rO) (AG) jset cntr c^ to 

N-2 for looping N times & (bO) — ► (rO),rO acts as pointer to a 


Data = K-2 


(rl)(AG) jset cntr c^ to 


COMPUTATION starts 


D 


13(3) a 




19(9)1 next sequential microprogram location 


c 


oo 


microprogram 


c 

wr 

oo 




23 Jumps uncondi ional ly to labels for K times 

93 Decrements counter c2 and (bO) (AG) > (r5) (AG) 

94 (bl)(AB) is added to (r5) (AG) 8< checks c2 for completion of all rows 

' 

95 Repeats the whole operation for M times by jumping unconditionally 

to labels 8< (r5) (AG) ►(bO)(AG) 



















p^org 


lOOh 

d_org 


?0h 

M 

equ 

4 

N 

equ 

4 

K 

equ 

6 


2 50h & enable & yrtb<bO><rO> & dsel 
2 lOOh & enable & dti<il> 

2 150h & enable & btr(b3)<r2) 

2 N & enable & yrtb<bl><rO> & dsel 
enable & rdlatch & bus 
2 1 & enable & Mrrsp 
2 label'4 & enable & psdss 
2 M-2 & enable & wrcntr<c2> 

label?: wrcntr<cl) Sc enable St 2 K-2 & itr<il)<rl) 
label?: Mrcntr<cOi> Sc enable Sc 2 N-2 & btr(bO><rO> 
label2: yinc<cO)<rO> & rd & x=bus 
yinc<cl)<rl) Sc rd Sc y=bus_ckmr_xus*yus+rtir 
Sc branch<uncondi t ional > (c0> & 2 label2 Sc enable 
label4: cont 

enable & y=bus_ckiTir_xus*yus & dccntr<cl) Sc rdlatch 
cont 

bus=ls Sc macenable Sc yinc<c2)<r2> Sc wr Sc j two(s i gn) <cl > 
ida<unconditional > & 2 label? & enable 
dccntr<c2> Sc btr(b0>(r5) 
jtwo(sign><c2) & yadd<cO> <bl > (r5> 

j da(uncondi t ional ) Sc 2 label? Sc enable Sc yrtb(bO><r?> 




Fig i.Z Microprogram Matrix multipl ica.tion for System 1 


23 Sequencer generates next microinstruction address = label3 

24 Microoperation in labels are performed : 

. set up loop counter for N : (N-2) — ► (counter co ) 

. initialise pointer to start of next column 
iv) Computation of next c , i.e c^^ starts . 

mk Ol 

25 Similar to cycles 10-21. 

i 

36 ( MR) = c ► BUS ( ► memory ) 

on 

37 

38 


39 

i 

50 

51 
52. 
53 

i 

64 

65 
.66 
67 

1 

78 

79 

80 
’81 


92 


05 


♦ c 


v> After completion of all K elements k = 0 < K-1 > of 

a row ( corresponding to jump unconditionally to label> i.e cycle 22 

of Table 4.1), computation of next row c starts . 

95 Counter c < loop counter for m > is decremented 
2 

& <b0) = 50h ► (r5> 

94 check counter c^for completion of all rows flc<bl >=N+<r5) 

: Start address of next row is generated by 
adding offset = N. 

95 If m < M jumps to label5 & (r5> ► b^ 

: b now points to start of next row. 


Results & discussion 


1. i) For a ( 4x4 > x ( 4x6 > multiplication, each 

c. .element requires 4 MAC operations resulting in a total of <4 
MAC/ element > x 24 = 96 MAC operations. 

ii> A total of 555 clock cycles are required for 
getting the product matrix C . Assuming a clock cycle of 100 nsec 
CADSP lllOA can operate with a maximum clock period of 50 nsecl 


2 . 

Total No. of 


Throughput achieved = = = 2.71 MOPS 

100 X 10“^ X 555 

i> For a [10x103 x [10x203 multiplication , 

MAC operations = <10 MAC/element) x 200 elements = 2000 
ii> Total No. of cycles required = 5257. 


Throughput achieved = 


2000 


3.82 


MOPS 


10 




X 


5237 


1*1.2 Algorithm. 2 : 1—D Convolxition 


The convolution of two sequences 

x(n) = { x(0) , x(l) x(M-l) }, 

h(n) = { h(0), h(l) h(N-l) ) 


is given by. 


%?o ^k’‘n-k 


Let , 


, n = 0,1 , M+N-1. 


X (n) = 0 , for n < 0, 
h(n) = 0, for n < 0. 

h(n) and x(n) elements are stored in data memory starting at 
location 50h and lOOh respectively. The microprogram for this 
algorithm is shown in Fig 4.3, and the corresponding 
cycle-by-cycle operations are explained in Table 4.2. 

The microprogram shown in Fig 4.3 performs the following 
operations, 

(i) Initialisation : 

Cycle 1-8 involves, 

. Setting loop counters for M+N-1 and N 
. initialising the LS-reg of MAC to zero 
. other book-keeping operations. 

(ii) Computation : 

Actual computation starts in cycle 9, 


:ycles 


Table 4.2 
1— D Convolut-lon 
yCki » xCnO 4 hCrO. 

16; N = 5; K=M + N- 



15(7) 


16(8) 


17(9) h 


18(10) 


19(11) Data = N+1 is added to (rl)(A6) 8« Decrements cl 


20(12) 


21(13) Continues to next sequential microprogram location. 


22(14) 








































9(1) 

loads h 

o 

to 

X-reg. 

- 

10(2) 

loads X 

o 

to 

Y-reg; X — 

► XD; initiation of MAC operation 


i 


12(4) Result (a h +0) is clocked into MR-reg and MAC 
o o 

i 

19(ll)Data = N + 1 is added to (rl), which then pointes to ' 

for computation of next element , y^. 

20(12)Load zero to Y-reg jresult y^ ► MR-reg. >■ 

o 

[initialisation for the next y 1 

n 

22(14)MR = y is output to BUS and written into memory 
o 

[Computation of each y effctively takes 14 cycles]. 

n 

(iii) Initialisation for next element (y ) : 

n 

23 Sequencer generates NEXT microinstruction 
address = label 1. 

24 Microoperations in labell are performed ,i.e, 

* Set up loop counter for N : (N-2) ► (cO) 

. Initiate pointer to starting location of 
coefficient elements h(n). 

(iv) Computation of next element y^ srarts, 

25 (Similar to 8-21) 

1 

38 ► (MR) = y ► BUS ► Memory. 

1 . 

39 

40 

41 

1 

54 ► y 

a 


326 


9(1) Ic^ads h to X-reg. 
o 

10(2) loads X to Y-reg;X ► XD; init iat ion of MAC operation 

o 

i 

12(4) Result (a h +0) is clocked into MR-reg and MAC 
o o 

i 

19(ll)Data = N + 1 is added to (rl), which then pointes to 
for computation of next element , y^. 

20(12)Load zero to Y-reg jresult ► MR-reg. 

C ini t ial isat ion for the next y 1 

n 

22(14)MR = y is output to BUS and written into memory 
o 

[Computation of each y effctively takes 14 cycles). 

T> 

(iii) Initialisation for next element (y ) : 

n 

23 Sequencer generates NEXT microinstruction 
address = label 1. 

24 Microoperations in labell are performed ,i.e, 

. Set up loop counter for N : (N-2) ► (cO) 

. Initiate pointer to starting location of 
coefficient elements h(n). 

(iv) Computation of next element y^ srarts, 

25 (Similar to 8-21) 

38 ► (MR) = y^ ► BUS ► Memory. 

39 

40 

41 


W 



326 



P_org 50h 

d_org 50h 

N equ 5 

M equ 16 


2 50h Sc enable S< dtiCiO) 

2 lOOh & enable & btr(b3)(r2> 

2 150h Sc enable 8« btr(b3)Cr2) 
enable S< rdlatch Sc ls=bus 
2 1 Sc enable S< wrrsp 
2 label3 Sc enable Sc psdss 
wrcntr(cl) 8< 2 M+N-3 Sc enable 

labell: wrcntr(cO) Sc 2 N-2 Sc enable Sc itr(iO)(rO 
label2: yincCcO)(rO) Sc rd Sc x=bus 

branchtunconditional ) <cO) Sc ydec(cl)(rl) Sc rd Sc 2 label2 
Sc enable Sc y=bus_ckinr_xus»yus+mr 

label3: 2 N+1 Sc enable Sc yadd(cO> (b3) (r 11 Sc dccntr(cl) 

enable Sc y=bus_cknfir_xus»yus Sc rdlatch 

cont 

bus=ls Sc macenable Sc jtwo(signl (cl) Sc yinc(c2)(r2) Sc wr 
Jda(uncondit ional > Sc 2 labell Sc enable 


Fig 4.3 : Microprogram for 1-D convolution for System 1 



Result^s & Dlscvission 


1 (i) For a CSxl63 convolution, each element y requires 5 MAC 

n 

operations resulting a total of C5MAC/e lement 3 x 20 = 100 MAC 

operations. 

(ii) A total of 326 clock cycles are required for getting the 
convolved sequence, y . 

n 

Throughput = = 3.08 MOPs. 

100 X lo'^x 326 

2 (i) For a £10x303 convolution. 

Total MAC operations = 390. 

(ii) Total number of clock cycles = 1020. 

390 

Throughput achieved = = = 3.82 MOP 

100 X 10 X 1020 


4. 2 Syst.eia 2t One meinory— Two MAC 


4.2.1 Algorithm i : Matrix Multiplication 


In this architecture, there are two processors and one 


shared memoryunit connected via a common bus. In order to effectively 



utilise the two processors two elements c .and c , of a row 
calculated simultaniously . 

For example , 



a b 
oo oo 

+ 

a b 

Ol lO 

+ 

a b 4- a b 

02 20 03 30 


a b 
oo oi 


a b 

OA 

4* 

a b 4- a b • 

02 21 03 31 


P_org 50h 

d_org 50h 

M 4 

N 4 

K 6 . 

2 50h 8. enable 8. yrtb<bO)(rO) & dsel 
2 lOOh Zc enable 8e dti<il) 

2 lOOh+N 8. enable 8, dti(i2) 

2 150h 8c enable 8c btr<b31(r3) 

enable 8c rdlatch 8c rdlatchl 8c ls=bus 8c ls=busl 
2 1 8c enable 8c wrrsp 
2 label 3 8c enable Zt psdss 
2 M-2 8c enable 8< wrcntr(c2) 

label4s 2 k/2-2 8c enable 8c wrcntr(cl) 8c itr(il)(rl) 
itrCi2)(r2) 

label2: 2 N-2 8c enable 8c wrcntr(cO) 8c btr(bO)(rO) 
labell: yinc(cO)(rO) Sc rd 8c x=bus 8c x=busl 8c rdlatchl 
yinctcDCrl) 8c rd 8c y*bus_ckmr _xus»yus+mr 
rdlatchl 8c yinc<c2)(r2J 8c rd 8c y=bus_ckmr_xus»yus+mr 1 8c 


are 



branchtunconditional ) (cO) & 2 label 1 8e enable 
labels: cont 

enable 8t y=bus_ckmr_>{us»yus & y=bus_ckmr_xus»yusl 
8c rdlatch 8c rdlatchl 
2 N 8c enable 8c yadd<c2) (b3) (r 1 1 
bus=ls 8c macenable 8c yinc(c2)(r3) 8c wr 

bus=lsl 8c maclenable 8c yinc(c2)(r31 8. wr 8c dccntr(cl) 8c wrlatchl 
jtwo(sign) (cl) 8c 2 N 8c enable 8t yadd (c2) (b3)r2) 
jdaCuncondit ional ) 8c 2 label2 8c enable 
dccntr(c2) 8c btr(b0)(r5) 

Jtwo(sign) (c2) 8c 2 N 8c enable 8c yadd (cO) (b3) (r5) 

JdaCuncodit ional) Sc 2 label4 8c enable 8c yrtb(b0)(r5) 

Fig 4.4 t Microprogram for Matrix multiplication for System 2 


As shown in Table 4.3( the corresponding microprogram is shown 

is Fig 4.4 ), in cycle 1 row element a isfed to both MACs 

oo 

simultaneously . In cycle 2 , column element b (first column ) 

00 

is fed to MAC 1 while in cycle 3, column element b ( second column 

01 

) is fed to MAC 2 . In contrast to System 1 ( IMAC 8c IMemory ), 
where operands could be supplied to the MAC every second cycle , in 
this architecture ( 2 MAC 8c 2 Memory ) operands can be suplied to the 
respective MACs only every third cycle due to common-bus 
configuration . 

CENTRAL li'^RARY 

1 I r JR 

4m. No. 


51 


TAblo 4r. 3 


M 


MATRIX Vs MATRIX 




= 4 


N = 4 


K 


MXK 

= 6 



Initial isat ion: 

1 

Data - 50h ► (bO)(AG) 

2 

Data = lOOh ► (ilXAG) 

3 

Data = lOOh+1 ». (i2J(Ae> 

4 

Data = 150h ► (r3)tAG) 

5 

Data - 0 ► (LS) of MACl and MAC2 

6 

Data - 1 ► LSP (SEQ) 

7 

Data labels ► RAM(l) <SEQ) 

8 

Data = M-2 ► (c2)(SEQ) 


Data = K/2-2 ► (clXSEQ) & (ilXAG) ► (rlXAG) 

10 

(i2)(AG) ► (r2XAG) 

lab 


el2 


11 

Data =N-2 ► (cO) CSEQ) & (bOXAG) ► (rOXAG) 

lab 

ell 

Computation starts 

12 




mm 

mm 

“Y“ 

-pfR 

am 

mm 

mm 

~Y~ 

RR 

warn 

■ia[2|H| 

ITTTT 

a 

oo 





^OO 






f^2r 


^oo 

b 

oo 









iw(wy 






■ 

^oo 

b 

Ol 





^oi 







■ 




16(5) 



b 

lO 









17(6) 








BB 

Z=a b 
oo Ol 



18(7) 

®02 






Hi 





19(8) 


^02 

IS 

Z=Z+a b 

Ol lO 








20(9) 







a 

02 


SssSI+a b 

Ol 11 



21(16 

a 

oa 





^oa 






pm 

j 

^oa 

b 

ao 

ESSSilSPH 








S 







a 

oa 


IHHiiSeEH 



24(13 

Continues to next microprogram 

locat ion. 

25rfT^ 



H 

c 

OO 




la 

c 

Ol 




52 


















































It can be seen from Table 4.3 that on the average 4 (5) MAC 
operations are performed in 7 (8) cycles , whereas in System 1 , the 

rate is 3 (4) MACs per 7(8) cycles .Thus with 2MACs , the speedup is 
by a factor of 1.25 to 1.33 . 


Results t» Discussion 

1. C4x4] X [4x4 3 multiplication »■ 263 cycles 

= 3.66 MOPS 

► 3847 cycles 

Throughput = = 5.2 MOPs 

10"'x 3847 

Comparing with the results of System 1 , we concluded that 
Systems 2 attain a speed-up factor of about 1.35—1.36 . Theoritical 

speed-up factor of 2 cannot be approached because of the single-bus 
configuration . 

4.2.2 AlgoTtthm, 2 : 1~D corwolut ion 

In this the x-seguence is made two parts and stored in data 
memory at separate locations FBh and 125h respectively. The overlapping 


Throughput = 


96 


10 X 263 


2. [10x103 X [10x203 multiplication 



terms are calculated first. While calculating non -over lapping 
the coefficient elements are fed to both MACs simultaneously. 

The microprogram for this algorithm is shown in Fig 4.5 
explained on a cycle-by-cycle basis in Table 4.4. 


P_org 50h 

N equ 5 

M equ 8 

1 

2 50h & enable & dtiCiO) 

2 50h+N-l & enable & yrtb(bO)(rO) & dsel 
2 lOOh-N+2 & enable btr(b3)Cr2) 

2 125h 8c enable 8. btr(b3)(4) 

2 200h 8. enable 8. btr(b3)Cr6) 

2 IFFh 8c enable 8. btr(b3)(r7) 

enable 8c rdlatch 8c rdlatchl 8c ls=bus 8c ls=busl 
2 1 8c enable 8c wrrsp 
2 labels 8c enable 8c psdss 
2 labels 8c enable 8c psdss 
2 M -2 8c enable 8c wrentr (c3) 

j. 

2 N-3 8c enable 8c wrcntrCc2) 8c btr(b0)(r5) 
label6; 2 M-3 8c enable Sc wrentr (cO) 8c itr(iO)CrO) 
label 1: yinc(cO)(rO) 8c rd 8c x=bus 
ydecCcO>(r4) 8c rd 8c y=bus_ckmr_xus»yus+mr 
yincCclMr2) 8c rd x=busl 8c rdlatchl 


terms. 


and 



ydec(cO)(r5) & rd 8< y=bus_ckmr_xus»yus+mr 1 & rdlatchl 

branchtunconditional) 2 label 1 Se enable 

labels: dccntr(c2) & 2 N 8c enable 8c yadd (cOMbS) (r4) 

2 1 8c enable 8c x=busl 8c rdlatchl & ckmr 8c rdlatch 
bus=ls 8c macenable 8c y=bus_ckmr_xus»yus+inr 1 8c rdlatchl 
cont 

enable 8c y=bus_ckmr_xus»yus 8c y=bus_ckmr_xus»yusl 8c rdlatch 8c 
rdlatchl 8c btr(bO)Cr5J 
2 N-2 8c enable 8c ysub(cO) (b3>(r2) 

bus=lsl 8c maclenable 8c yinc<c3)(r6) Sc wr 8c wrlatchl 8c 
jtwoCsign) (c2) 

Jda(unconditional) 8c 2 label6 8< enable 
ydec(cO) (r2) 

label?: 2 N-2 8c enable 8c wrcntr(cl> 8c itr<iO)(rO) 

label?: yinc<cO)CrO) 8c rd 8c x=bus 8c x=busl 8c rdltchl 

ydec(c0)(r4) 8c rd 8c y=bus_ckmr_xus»yus+mr 

ydec(c0)(r2) 8c rd 8c y=bus_ckmr_xus»yus+mr 1 8c rdlatchl 8c 

branchtunconditional) (cl) 8c 2 label? 8c enable 

labels: 2 N+l 8c enable 8c yadd(cO) (b3) (r4) Sc dccntr(c3) 

enable 8c y=bus_ckmr_xus»yus 8c y=bus_ckmr_xus»yusl 8c rdlatch 

8c rdlatchl 

2 N-1 8c enable 8c yadd (cO) <b3) (r2) 
bus=ls 8c macenable 8c yinc(c0)(r6) Sc wr 

bus=lsl 8c maclenable 8c ydec(cO)(r?) 8c wr 8c jtwo(sign) (c3) 

8c wrlatchl 

JdaCuncondit ional ) 8c 2 label? 8c enable 

Fig -4. 5 : Microprogram for 1-D convolution for System 2 



Table 4.4 
1— D Convolution 


yCkD « xCia> ★ hCn>. 


M = 8 ; 
a 


M = 8; N = 5f K = M + M^+ N - 1 

2 i 2 



Init i 1 
Data = 
Data = 
Data = 
Data = 
Data = 
Data = 
Data = 
Data = 
Data = 
Data = 
Data = 
Data = 


isat ion 

50h 

50h+N-l 

100h-l\H-2 

125h 

200h 

IFFh 

0 

1 

labels - 
label8 - 


Data = N - 


labell Computation 


<iO)tA6) 

► CbOXAG) 

► (r2)(AB) 

-► (r4)(AB) 

-♦ (r6)<AB) 

-► (r7>(AB) 

-► (LS-reg) (MACl and MAC2) 
(LSP) (SEQ) 

► RAMdXSEQ) 

»■ RAM(2XSEQ) 

► CcSXSEQ) 

— ► (c2) (SEQ) 8. (bO) <AG) — 


• tcOXSEQ) 8c (iOXAG) 


<r5) (AB) 
















































37(24) Repeats the above operation for N - 1 times by jumping unconditiona- 


Ily to label6. 

110 Decrements (r2)<AG). 


Data = N-2 


-♦ (cl) (AG) S. (iO)(AG) 


-*■ (r0)(A6) 










































































132 Repeats the above computation for M times by Jumping unconditio- 
nally to label?. * 
























Results & Discussion 


1 (i) C5xl6] convolution ► 287 cycles. 

(ii) Throughput = s =3.5 MOPs. 

10 X 10"^x 287 

2 (i) C 10x303 convolution ► 971 cycles. 

390 

(ii) Throughput = 3 = 4.02 MOPs. 

10 X 10 X 971 

Comparing to System 1, we conclude that System 2 attains a 
speed-up factor of about 1.2. Theoritical speed-up factor of 2 cannot 
be achieved because of the single bus configuration. 


4.3 system 3 s 2 MACs and 2 memory units 


In this architecture , each MAC has its own local memory 
connected through local bus .The two units are then connected to the 
HOST by a common bus . 



i zMalTix Multi pi iccctiart 


In this architecture r B matrix is stored in both the data 
memories » Half of A matrix is stored in one data memory (Mem 1) 
the other half is stored in the other data memory (Hem 2 ) Fig 4 


a 

a 

a 

a 


oi 

01 

02 

03 


a 

a 


20 

21 


a 


23 



b 

35 


Memory 1 


a 

a 

a 

a 


lO 

1 i 

12 

13 



a 


33 



Memory 2 


and 
& « 


Fig 


4# 6 Storing of the matrices in two separate memories 




As shown in Table 4.5(the corresponding microprogram is given in 
Fig 4.7) two separate row elements c , Ci.e c ) and c , ( i.e c^^ ) 

m.K OO m+i,k iO 

are calculated simultaneously in MACl and MAC2 respectively . The 
microoperations shown in Table 4.1 for IMAC-IMEM case are 
essentially duplicated and executed in parallel in the two HACs 
thereby doubling the throughput . MACl outputs c in the 

OO 

12*^^^ "computation cycle"as in Table 1 and writt^if^ into Memory 1. The 

element c which is available in MR-reg (MAC2) at the same time as 
io 

c , can however be output on the bus only after c r i.e in 13*^^ 

OO OO 

computation cycle due to common bus-str ucture . Here we are assuming 
that product matrix is stored in one memory row-wise . 


P_org lOOh 

M equ 4 

N equ 4 

K equ 6 

2 50h it enable & yrtbtbOXrO) & dsel & yrtbl(bO) (rO) & dsel 
2 lOOh it enable it dti(il) it dtil(il) 

2 150h it enable it btrl(b3)(r2) 

2 150h+K 8. enable 8, btrl<b3)(r3) 

enable & rdlatch & rdlatchl & ls=bus it Is-busl 

2 1 & enable it wrrsp 

2 labels it enable & psdss 



2 label2 8c enable 8c psdss 
2 M/2-2 8c enable 8c wrcntr(c2) 

labels! 2 K-2 8c enable 8c wrcntr(cl) 8. 8c itrKilXrU 

label4: 2 N-2 8c enable 7 wrcntr(cO) 8c btr(bOXrO) 8. btrl(bOXrO) 

labell: yinc(cO)(rO) 8c rd 8< x~bus 8c yincl (cO) (rO) 8c rdl 8c x=busl 

yinc(cO)(rl) 8c rd 8c y=bus_ckmr_xus»yus+mr 8c yincl (cO) (r 1 ) 8c rdl 
8c y=bus_ckiTir_xus»yus+inr 1 8c 2 labell 8c enable 8c 
branch (unconditional! (cO> 
labels: cont 

enable 8c rdlatch 8c rdlatchl 8c y=bus_ckmr_xus»yus 8c 

y =b us_c kmr _x us»y us 

cont 

wrlatch 8c bus^ls 8c macenable 8c rdlatch 8c wrl 7 yincl (cl ) (r2) 
bus^^lsl 8c maclenable 8c yincl (cl) (rS) 8c wrl 
branch ( uncond i t ional ) (cl) 8c 2 label4 8c enable 
label2: 2 K 8c enable 8< yaddl (cO) (bS) (rS) 

yrtb(bO)(rO) 8c yr tbl (bO) (rO) 8c branch(unconditional) (c2) 

2 labels 8< enable 

Fig 4.7 Microprogram for Matrix multiplication for System 3 



Table 4.5 
MATRIX Vs MATRIX 


CA3 CB3 = LCl 

MXM NXK MXK 



M = 4 

N = 4 K = 6 



1 

Initialisation 
n.T t n snh 

^ { HA ^ ( AR 1^ ^hAW ART' ) 




M 1 W AR 1^ 5t M1W ART ) 



3 

4 

JL/d V d *“ * Vi/v/i 1 ‘ 

n ^ 1 S AH 

“ V X X X r 10 X ^ CX V X X V fnwiiC« >r 

^ ^ ART^ 



JL/d V d ^ JL 'kJV/l 1 ■' 

His 4 a — -1 eCAK a. U 

X r jCL X r*i waC. J 

/v0\/Ar20% 



5 

A 

jL/ca V <ai A wv/i 1 • rv ^ ^ NnwA.w' 

Data - 0 »■ (LS) of MACl and MAC2 

DAM/1 



o 

7 


r 1 xni I X X X ^ 

X RAMfT> fRFA^ 



t 

8 

g 

jL/d l» ^mJI JL ^nJIIhJ Im* X ^htm. 

Tisk + A 1 

P 1 X XL / X wELtLsI </ 

LSPtSEQ) 



L/d 1. d JL ^ 

r^-x- ^ M/7-0 



i 

JL/d L d ““ 1 1/ jb. «» -1----- 

r VwXLy V wlmVSC w' 



labels 

10 

Data=K-2 ►(cU (SEQ) 

8c (ilKAGl) ►(rlMAGl) 

8c 

(ilMAG2)-*-(rl) (AG2> 

label4 

11 

Data=N-2 ►CcO) tSEQ) 

8c (bOXAGl) ►(rOKAGl) 

8c 

(bO)CAB2) — ».(rO>(AG2 


fiEin — 

Computation HACl 

FfSC5 

12(1) 


^or 


MR 

T7U^ 




FIR 

Tm 

MEM 

a 

oo 





a 

lO 






13 <2) 


^oo 

^oo 




a 

lO 





14(3) 

a 

oi 





a 






1S(4) 


^oi 


Z s a b 

oo oo 





Z = a b 

lO Ol 



i6(S) 

^02 





a 

i.2 






f7T5'5'' 


a 

02 

b 

20 

SssSl+a b 

oi ±o 



^12 

b 

21 

£=l!+a b 

11 11 



18(7) 

®oa 





a 

la 


i 




1<?(S) 


oa 

b 

ao 

S=Z+a b 

02 20 



^X9 

b 

ai 

Z=Z+a b 

12 21 




Continues 1 

:o next sequential 

L microprogramme location- 

21(10) 




^5^ 

c 

oo 




rt) 

c 

lO 




Continues to next sequential micro programme location 


~TSU7T 





^=00 






wr 

24(13) 










c 

lO 

wr 


1 

1+1 

1+2 


Data = K is added to (r2) (AB2) 

Data = K is added to (r3)<AG2) 
(rOKAGl) ► (bOMAGU & (rO) (AG2) 


+ (b01CAG2) & Tests 


for sign of c2, if negative, jumps unconditionally to labels. 



Results and Discussion 


1 . C4x43 X C4x43 multiplication ► ISScycles. 


Throughput = 


96 


lO"^ X 185 


5.22 MOPS 


2 . ClOxlO] X [10x203 multiplication *• 2629 cycles 


Throughput 


2000 


10“^x 2629 


= 7.61 MOPs 


The speed-up factor is 1.92-1.99 with reference to System 1 , 

thus approaching the theoritical factor of 2 . 


4.3*2 Algorithm : i-D Convolution 

As explained earlier in Section 4.2.2 x-sequence is 
partitioned into two parts and the two parts are stored in separate 
memory units. Convolution is performed in the same manner as explained 
earlier. The microprogram for this in shown in Fig 4.8 and on a cycle 
by cycle basis in Table 4.6. 



P_org 


lOOh 


N equ 5 

li^ equ 8 


2 50h S, enable 8< dti(iO) & dtil(iO) 

2 50h+N--l 8t enable & yrtb(bO)(rO> & dsel 
2 lOOh-N+2 8c enable & btr(b3)tr2> 

2 125h & enable & btrl(b3)Cr4) 

2 150h 8. enable 8c btrl<b3)(r6) 

2 14Fh 8c enable 8. btrl(b3Kr7) 

enable 8c rdlatch 8c rdlatchl 8c ls=bus 8c ls=busl 
2 1 8c enable 8c wrrsp 
2 label6 8c enable 8c psdss 
2 label? 8c enable 8c psdss 
2 li^-2 8c enable 8c wrcntr (c3) 

2 N-3 8c enable 8c wrcntr Cc2> 

label2: 2 N-3 8c enable 8c wrcntr (cO) 8c itrl(iO)CrO) 8c btr(bO)(rO) 
labell: yinc(c0)(r2) 8c rd 8< x=bus 8c yincl (cO) (rO) 8c rdl 8c x~busl 
ydec(cO)(rO) 8c rd 8c y=bus_ckmr_xus»yus+mr 8c ydecl (cO) Cr4) 8c rdl 
8c y=bus_j:kmr _xus»yus+mr 1 8c branchCunconditional) (cO) 8c 2 labell 
8c enable 

labell: 2 N-2 8c enable 8c rdlatch 8c ysubCcO) (b3) tr2) 

2 1 8c enable 8c rdlatchl 8c x^busl 8c ckmr 

wr latch 8c bus=ls 8c macenable 8c rdlatch 8c y=bus_ckmr_xus»yus+mr 1 
2 N 8c enable 8c yaddl(cO) (b3) (r4) 



enable 8< rdlatch Sc rdlatchl & y=bus_ckinr_xus»yus Sc dccntr(c2) 

Sc y=bus_ckmr_xus»yusi 
cont 

bus=lsl Sc maclenable Sc yincl (c2) (r6) Sc wrl Sc jtwo(sign) (c21 
JdaCuncondit ional) Sc 2 label2 Sc enable 
ydec (cO) Cr2) 

label4; itr(iOKrO) Sc itrl(iO)(rO> Sc 2 N-2 Sc enable Sc wrcnteCcU 
labels: yinc(cO)(rO) Sc rd Sc x=bue Sc yincKcOlCrO) Sc rdl Sc x^^busl 
ydec(cO)Cr2) Sc rd Sc y=bus_ckmr_xus»yus+mr Sc ydecl (cO) <r4) Sc rdl 
Sc y=bus_ckmr_xus»yus+inr 1 Sc branch(unconditional) (cl) Sc 2 labels 
Sc enable 

label?: 2 N-1 S. enable 8. yadd CcO) (bS) (r2) 
enable Sc rdlatch Sc rdlatchl Sc y=bu&_ckmr_xus»(yus 
Sc y=bus_ckinr_xus«yus 

2 N+1 Sc enable Sc yaddl (cO) (bS) (r4) Sc dccntr(cS) 
wr latch Sc bus=ls Sc macenable Sc rdlatchl Sc wrl Sc ydecl (cO) (r7) 
bus^lsl Sc maclenable Sc wrl Sc y incl (cO) (r6) Sc Jtwo(sign> (c3) 
jdaCunconditionall Sc enable Sc 2 label4 

Fig 4 . 8 : Microprogram for 1-D convolution for System S 



Tabid 4* 6 
ID-CONVOLUTION 


yCk3 « xCrO * hCnD 


Nb5;N>8; M«=8j K«N + M+ M-1»20 

1 2 12 





95 


























































Results & Discussion 


1 (i) C5xl61 convolution ► 215 cycles . 

(ii) Throughput = = 4.673 MOPs 

lO" X 215 

2 (i) C10x30] convolution ► 659 cycles 

“^90 

(ii) Throughput = j = 5.93 MOPs. 

lO” X 659 

The speed-up factor in this example, comparing with System 1 
is about 1.52-1.56 times. 

4. 4 MET A- ASSEMBLER 

User should feed the microcoded system in binary format , which 
is also called machine code . It is a laborious task , to change 
each field-'s instruction into the corresponding binary format 
And hence a met a -assembler , which converts the 

microprogram sequence in assembly language into machine code , is 



used here C21 


The met a -assembler works in two passes as shown in Fig 4.4 . 


a:> definition phase 


Length 

Fields 


.ASSEMBLY PHASE 



labels 

constant 


Mnemonics 


Defaults 


Load Instructions 


Assembly listing 


Fig 4.0 Two Phase Meta-Assembler 

In the first phase , a definition file is processed to 
get a compact definition file . This compact definition file along 
with microprogram assembly language is procesed to get machine 
for the microprogram . The definition file and compact 
definition for each proposed system arc given in Appendix D , E and F 
respectively . 





CHAPTER 5 


CONCLUSIONS 

Microcoded systems are extensively used in signal processing 
applications to achieve high functional parallelism and thereby 
attaining high throughput . Si mulat ion of a system is a software tool 
widely used for system verification and program development. In this 
thesis three microcoded signal procesors have been proposed and a 
simulator for these systems has been developed. 

The simulator permits programs to be finely-tuned in software 

prior to making significant effort to bring up the target system in 
hardware. The simulator is interactive, it allows displays of 

registers and memory contents. Single stepping and full speed run 
is possible. 

Using the simulator, the performance of these systems has been 
evaluated by executing matrix mlt ipl icat ion and convolution 

algor i thms. 

It is observed that of all the systems. System 3 is the most 

efficient, achieving double the efficiency of System 1. This is 
because of its separate bus structure. 

It is also found that the single-port structure of lllOA 
processor considerably effects system throughput since all I/O 
operations take place via a single port. By replacing this processor 
by a multiport processor like ADSP-llOl (which has two input-ports 
and one output port), and using a dual port memory for 2-processor 
gyg'tgiTiSj one can achieve a significant increase in the system 
throughput . 
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ANALOG 

DEVICES 



Word-Slice™ 
Program Sequencer 




FEATURES 

IG-Bit Microcode Addressing Capability 
Look-Ahead™ Pipeline 

Extensive Interrupt Processing, With Ten On-Chip 
Interrupt Vectors 

70ns Cycle Time; 25ns Clock-to-Address Delay 
64-Word RAM for Storing: 

Subroutine Linkage 
Jump Addresses 
Counters 
Status Register 

375mW Maximum Power Dissipation with 
CMOS Technology 
48-Pin DIP 


ENERAL DESCRIPTION 

he ADSP-1401 is a high-speed microprogram controller op- 
[nized for the demanding sequencing tasks found in digital 
g:nal processors and general purpose computers. In addition to 
gh speed (25ns clock-to-address delay) and large addressing 
nge (64K of program memory), this Word-Slice component 
IS unique features that make it highly versatile: 

• on-chip storage and control of ten prioritized and 
maskable interrupts 

• four decrementing event counters 

• absolute, relative and indirect addressing capability 

• download capability (writeable control store) and 

• a dynamically configurable 64- word RAM. 

le ADSP-1401 microprogram sequencer’s main task is to 
ovide the appropriate microprogram addressing to support 
ogramming requirements (e.g., looping, jumping, branching, 
brou tines, condition testing and interrupts). An internal Look- 
lead pipeline, controlled by both phases of the clock, allows 
e ADSP-1401 to satisfy these requirements at very high speed. 

tiring each micro-instruction, the ADSP-1401 monitors the 
Editions and instructions to determine the next microprogram 
dress. This address can come from one of several sources: the 
ick, the jump address space in the RAM, the data port, the 
terrupt vectors, or the microprogram counter. An extensive 
t of conditional instructions are also available, including jumps, 
anches, subroutines, interrupts, and writeable control store. 


ook-Ahead and Word-Slice arc trademarks of Analog Devices, Inc. 


jformation furnished by Analog Devices is believed to be accurate 
id reliable. However, no responsibility is assumed by Analog Devices 
)r its use; nor for any infringements of patents or other rights of third 
arties which may result from its use. No license is granted by implica- 
on or otherwise under any patent or patent rights of Analog Devices. 



WORD-SLICE^'^ MICROCODED SYSTEM WITH ADSP-1401 


The ADSP-140rs internal 64- word RAM is user-configurable 
into three regions; subroutine stack, register stack and indirect 
jump address space. The subroutine stack is used for linking 
interrupts and subroutines and, during their execution, allow / 

storage of system states. The register stack allows association of 
unique jump addresses with various levels of interrupts and 
subroutines (both local and global stacks are provided). Indirect 
jiunp capability is also suppx)rted, addressing for which is provided 
at the data port. 

Interrupts are handled entirely on chip . The ADSP- 1401’$ kitemal j 

interrupt control logic includes registers for eight external (user) ^ 
interrupt vectors, a mask register, and a priority decoder. Two 
additional vectors are reserved for internally-generated interrupts 
resulting from counter underflow and stack limit violation. A 
stack limit violation is caused by stack overflow, underflow or 
collision. A mechanism is provided for recovering from stack viola- | 
tions. ‘ I 

The ADSP- 1401’$ four decrementing 16-bit counters are used to i 

track loops and events. These counters generate a signal when | 

negative. This negative condition is used by several conditional 1 
instructions and can also trigger an internal interrupt. 


Two Technology Way; Norwood, MA 02062-9106 U.S.A. 
Tel: 617/329-4700 Twx: 710/394-6577 

Telex: 174059 Cables: ANALOG NORWOODMASS 







Figure!. ADSP-140! Block Diagram 




























ADDRESSING MODES 

Direct: both absolute and relative 
Indirect: from intemal RAM 

HARDWARE FEATURES 
Instruction Port 
Bidirectional Data Port 
Four Input Address Multiplexer 
Three Stack Pointers 
Four Event Coimters 
Condition Flag 

Eight Prioritized and Maskable User Interrupts 
TTR Pin: 

Trap 

Three-State 

Reset 

INSTRUCTION TYPES 
Jumps and Branches 
Stack Operations 
Status Register Operations 
Counter Operations 
Interrupt Control 
Relative Address Width Controls 
Instruction Hold Control 
Writeable Control Store 
Dedicated Counter Underflow Interrupt 
Dedicated Stack Overflow Interrupt 


ADSP-1401 PIN ASSIGNMENTS 


Pin Name 
l6-Io 

Yi5-Yo 

Dis-Do 

EXIR4-1 

CLK 

FLAG 

TTR 

Vdd 

GND 


Description 

The 7-bit microinstruction controlling the 
ADSP-1401. 

Output bus which provides addresses to the micro- 
program memory. 

Bidirectional Data bus for transferring data to or 
from the ADSP- 140 1 . 

Four external interrupt request lines. Note that in- 
ternal circuitry supports 8 interrupts with the aid of 
an external 2 to 1 multiplexer. 

External clock input 

An input used for conditional instructions. Its 
source is usually a condition multiplexer. 

A multi-purpose pin accommodating traps, output 
disable and reset. 

+ 5 Volt supply. 

Ground. 




In a cache-based system, microcode is generally executed from 
the high-speed cache. If an access is attempted to code not 
resident in the cache area, the cache memory controller must 
detect the discrepancy and generate an exception to the access (a 
“cache miss”)* Then, the missing code segment must be down- 
loaded to the cache memory area (see: Instruction Set Description 
- Writeable Control Store, 2.7). 

When a cache miss occurs, the cache memory control logic 
asserts the TTR pin while stretching the system clock LO. 

Upon detecting the trap request, the sequencer immediately 
generates the highest priority interrupt, IR9, replacing the current 
address (that causing the cache miss). The cache miss address is 
pushed on the subroutine stack and popped after the interrupt 
service routine has reloaded the cache area with the missing 
code segment. 

The trap interrupt differs from the standard interrupt protocol 
in three ways: 

1. The interrupt vector, IV9, is output asynchronously, i.e., it 
occurs t-n^c, after asserting the Trap signal a^ must occur 
before the next cycle! To accomplish this, a clock stretch 
cycle may be needed to allow enough time to fetch the new 
instruction. 

2. The current address is pushed onto the SS for later restoration 
(after the cache miss is resolved), whereas standard interrupts 
push the current address + 1 . 

3. Trap interrupts cannot be masked or disabled. Note that if 
IR9 is also used for stack overflow and underflow, the service 
routine must discriminate which actually occurred. 

Caution: because trapping is asynchronous, spikes on the TTR 
pin wider than 3ns during clock LO may initiate inadvertent 
trapping. 

Three-State 

The address port is placed in a high-impedance state when the 


TTR pin is HI during clock HI and LO during clock LO. The 
TTR signal is latched during clock LO and transparent during 
clock HI. This facilitates full cycle, three-state control. (Note 
that the IDLE instruction can also place the address port in a 
high-impedance state.) 

Reset 

The TTR pin may be used to initialize the A DSP- 140 1 by asserting 
it (HI for both clock phases) for at least three full cycles. Use of 
the reset operation alone does not require the multiplexing 
described above. However, if the trap and/or three-state controls 
are also needed, they must not occur in the same cycle (this 
would be an abnormal situation), as this constitutes a reset. The 
RESET signal forces a zero output address, places the data port 
in the high-impedance state, and resets internal registers as 
follows: 


Sequencer Status after RESET Operation 


Parameter 

Reset Condidon 

Program Counter 

fxCode Location 0000 15 

Subroudne Stack Pointer (SSP) 

RAM Location 00 10 

Stack Limit Register (SLR) 

RAM Location 32 10 

RAM Data 

No Change 

Counters 

No Change 

Interrupt Mask (SR 15 »6) 

All Bits to ‘0’ (Unmasked) 

Interrupt Vector File 

No Change 

Interrupt Vector Pointer (IVP) 

Set to IRVo 

SR5-4 

1 ‘00’ ( 1 6-Bit Relative Offsets) 

SR3 

; ‘0’(LSP Selected) 

SR2 

i ‘0’ (Interrupts Disabled) 

SR, 

I ‘0’ (Sign Bit Cleared) 

SRo 

1 ‘0’ (Latched Interrupt Mode) 

Writeable Control Store Mode 

* Cleared 


NOTE: 

The first instruction (microcode location 0000; 5 ) must be a “CONT”. 


2.0 INSTRUCTION SET DESCRIPTION 

The instruction set is divided into seven categories pertaining to 
generic operation (see data sheet outline or Mnemonics and 
Opcodes, 4.5). 

Several instructions employ two instruction bits (L and Iq) to 
specify a counter (C3-.0) and/or a local register (Rb-oj relative to 
the RSP) as arguments. Nine of the conditional instructions use 
another two instruction bits (I3 and I2) to select one of the four 
condition modes: 

‘00’ UNCONDITIONAL 

‘Or NOT FLAG 

‘10’ FLAG 

‘ir SIGN 

The sign bit of the status register, SRi, may also be used to 
(implicidy or explicidy) store an external condition. This is 
useful if the condidon results from an operation performed in 
the middle of a loop, but is not tested until the end; the loop is 
exited with an “If Sign: Jump” instruction. Recall that any 
subsequent counter operadons will overwrite SRi. 


2.1 Jump and Branch Instructions 

Jump and branch instrucdons provide flow control of microcode 
execution, offering three-way branches, jumps, subroudne calls, 
returns, and addressing mode selection (see Figure 6). These 
instrucdons support conditional control, allowing addressing 
from the register stack, the data port, or the indirect jump 
address space in the RAM. Generally, they are of the form: 

If Condition: Do Operation; Else, Continue. 

JPCOF IF FLAG: JUMP PC 

The address is not incremented while the flag is at a logic HI, 
i.e., PC< = PC. If the flag is LO, the next address is (PC 4 1), 

JPCNF IF NOT FLAG: JUMP PC 

The address is not incremented while the flag is at a logic I.O , 
i.e., PC< = PC. If the flag is HI, the next address is (PC 4 1). 



Figure 6. Instruction Flow Charts 


TWO IF CONDITION: JUMP PC + 2 

I the condition specified is met, this instruction causes the next 
:#quential microprogram address to be skipped. This instruction 
liiows .single instruction bypassing or interleaving without need 
;o provide explicit addressing. 

fDA IF CONDITION: JUMP DATA, ABSOLUTE 

If the specified condition is met, this instruction causes a jump 
to the absolute address at the data port. If the condition is not 
met, the next sequential instruction will be executed, 

JDR IF CONDITION: JUMP DATA, RELATIVE 

If the condition specified is met, the address at the data port 
will be added to the PC and output (jump distance is offset plus 
one). The offset width is determined by the address width selecdon 
(8, 12, or lb-bits). If the condition is not met, the next sequential 
instruction will be executed. 

JDI IF CONDITION: JUMP DATA, INDIRECT 

If the condition specified is met, this instruction will output the 
address stored in the RAM address given by bits Ds.q of the 
data port. If the condition is not met, the next sequential instruction 
be executed. 


JDRST IF SIGN OF Q: JUMP DATA, Ci< = Ri; 
ELSE,Ci< = Ci-l 

This instruction first tests the sign of the counter, Q. If negative, 
the address at the data port is output and the counter is re-initialized 
(reset) with the data in the register pointed to by (RSP + i). If 
the sign is positive, the counter is decremented and the next 
sequential address is output. The register and counter use the 
same subscript, i. 

JRC IF CONDITION: JUMPRi. (COND ^ SIGN) 

If the condition specified is met, output the address in RAM at 
the location (RSP + i), where i is given by Ii_o of the instruction. 
The selected condition may not be SIGN, as this is the JRS 
instruction. The PC may be pushed on the register stack and 
referenced as a register thus allowing a “jump to stack’’ instruction 
which is useful for looping. 

JRS IF SIGN OF Q: JUMP Rj, Ci< = Q - 1 ; 

ELSE,Ci< = Ci-l 

This instruction first tests the sign of counter, Cj. If negative, 
output the address in RAM at location (RSP + i). If the sign is 
positive, the next sequential microprogram address is output. 
The counter is always decremented after the test. 



JSA IF CONDITION: JUMP SUBROUTINE, 

ABSOLUTE 

If the condition specified is met, the 16-bit absolute address at 
the data port is output and the PC will be pushed onto the 
subroutine stack. If the condition is not met, the next sequential 
instruction will be executed. 

JSR IF CONDITION: JUMP SUBROUTINE, 

RELATIVE 

If the condition specified is met, the address at the data port is 
added to the PC and output (jump distance is offset plus one) 
and the PC is pushed onto the subroutine stack. The offset 
width is determined by the address width selection (8, 12, or 
16“bits). If the condition is not met, the next sequential instruction 
will be executed. 

RTN IF CONDITION : RETURN FROM 

SUBROUTINE 

This instruction is used to return from subroutines. If the condition 
specified is met, the subroutine stack is POPped, which outputs 
the return address and decrements the SSP. If the condition is 
not met, the next sequential instrucdon will be executed. 

BRANCH IF SIGN OF Q: JUMP Ri, Ci< = Q - 1; 

ELSE, IF CONDITION: 

JUMP DATA, Ci< = Ci-l; 

ELSE, Ci< = Ci ~ 1 (COND ^ SIGN) 

This instruction implements a three-way branch with the address 
source from the data port, register Ri, or the PC. The instruction 
first tests the sign bit of the counter Cj; if negative, the output 
address is given by Rj, i.e., RSP + i. If the sign was not true, 
but the specified condition is true, the address source is the data 
port. If the sign was not true and the condition is not met, the 
next sequential instruction is executed. 

The counter and the register use the same subscript value i. 

The counter is always decremented. Note that this instruction 
uses only absolute data addresses; relative addressing is not 
available with the three-way branch instruction. 


2.2 Stack Operations 
Subroutine Stack 

Subroutine Stack Pointer (SSP) instructions are used for main- 
taining the subroutine stack. These instructions may also be 
used to upload or download the entire RAM for examination, 
stack expansion or context switches. 

PSDSS PUSH DATA ONTO SS 

Increments the stack pointer and then loads the RAM location 
specified by the SSP with the data at the data port. 

PPSSD POP SS TO DATA PORT 

Transfers the contents of the stack location given by the stack 
pointer to the data port and decrements the stack pointer. 


WRSSP WRITE SSP 

Loads the SSP with bits Ds.q of the data port. 

RDSSP READ SSP 

Read the 6-bit subroutine stack pointer. This allows the value of 
the stack pointer to be saved or examined. Bits Ds.q of the data 
port correspond to bits 5-0 of the SSP. The 10 MSB’s of the 
data port (Di5^6) are undefined. 

DSSP DECREMENT SSP 

Decrements the stack pointer without reading. 

Register Stack 

Register Stack Pointer (RSP) instructions are used to upload 
and download the entire RAM for initialization, examination, or 
context switching and to maintain the RAM space allocated to 
local and global jump registers. As previously discussed, register 
stack instructions refer to either the Local Stack Pointer (LSP) 
or the Global Stack Pointer (GSP), depending upon the status 
register (SR3). If SR3 is LO, register stack instructions pertain 
to the LSP. If SR3 is HI, register stack instructions pertain to 
the GSP. 

SGSP SELECT GSP 

Select the Global Register Stack Pointer. Set Status bit SR3 
(HI). 

SLSP SELECT LSP 

Select the Local Register Stack Pointer. Clear Status bit SR3 
(LO). 

RDRSP READ RSP 

Transfers the RSP to the data port bits Ds^o examination or 
storage. The 10 MSBs (Di5^6) of the D port are undefmed. 

WRRSP WRITE RSP 

Preload the selected RSP (LSP or GSP) with bits Ds^o the 
data port. 

PSPC PUSH PC ONTO RS 

Decrements the RSP and writes the PC to the register stack. 
This instruction may be used to set up a JRC loop (IF 
CONDITION: JUMP Ri = PC). 

PSGSP PUSH GSP ONTO SS 

Increment the SSP and write the GSP onto the subroutine 
stack. 

PPGSP POP GSP FROM SS 

Write the subroutine stack to the GSP and decrement the SSP. 

PSDRS PUSH DATA ONTO RS 

Decrement the RSP and then write the data at the data port 
into the location specified by the updated RSP. 


PPRSD POP RS TO DATA PORT 

Transfers RAM data pointed to by the RSP to the data port and 
then increments the RSP. 

'■'i' 

AIRSP ADD i TO RSP 

Add i to the register stack pointer. Note that i = 0, 1, 2, or 3 in 
this instruction corresponds to 4, 1, 2, or 3, respectively. This 
instruction effectively removes up to four registers from the 
stack. 

SIRS? SUBTRACT ONE FROM RSP 

Subtract 1 from the RSP without a write. This instruction is 
used to modify the RSP without explicitly reloading it. 

S4RSP SUBTRACT FOUR FROM RSP 

Subtract four from the RSP without a write. This instruction 
may be used to modify the RSP without explicitly reloading it. 

^2.3 Status Register Operations 

The status register bits, SRis^os contain ten mask bits, SRi 5_65 
for masking interrupts IR 9 ^o? and six control bits, SRs-o (see 
Bidirectional Data Port, 1.4). The entire status register can be 
read or written via the data pon, or pushed or popped to/from 
the subroutine stack. Upon RESET, the entire status register is 
initialized to zero. 

RDSR READ SR 

The entire status register (SRi 5 _o) is output over the data port 
(Di5>o). 

WRSR WRITE SR 

Write the data port (Di 5 _o) to the status register (SRi 5 _o). 

PSSR PUSH SR ONTO SS 

Increment the SSP and then write the status register to the 
subroutine stack. 

PPSR POP SR FROM SS 

The top of the subroutine stack is written into the status register, 
and then the SSP is decremented. 

2.4 Counter Operations 

Counters may be pushed and popped to/from the subroutine 
stack or loaded directly from the data port. The counters may 
be read externally by pushing the counters onto the subroutine 
stack then popping the subroutine stack to the data port. The 
device has four counters, denoted Cj, which are indexed by the 
two LSB’s of the instruction. 

If a jump is required after N events (until sign), the counter 
should be loaded with two less than the number of events desired 
(N-2). If a jump is required for N events (while sign), the 
counter is loaded with 2^^ -f- N— 2= 8000i6+ N— 2. 

Ore must be taken when using the counter underflow interrupt 
(IRo, see 1.4.3) to clear the sign bit before the IRo mask bit is 
cleared. 

WRCNTR WRITE Q 

Write to the selected cotmter, Ci, from the data port. 

CLRS CLEAR SIGN BIT 

Clear status register bit SRi. 


SETS SET SIGN BIT 

Set status register bit SRi. 

PSCNTR PUSH Q ONTO SS 

Increment the SSP and write the specified counter onto the 
subroutine stack. 

PPCNTR POP Q FROM SS 

Transfer the data from the subroutine stack to the counter 
specified by the instruction, then decrement the SSP, 

DCCNTR DECREMENT Q 

Unconditionally decrement counter Ci. 

IFCDEC IF CONDITION: DECREMENT Cq 

Decrement counter Cq on condition. The status register bit SRi 
is used if the sign condition is selected. 

2.5 Interrupt Control 

Detailed interrupt operation is described in the Interrupts section 
(1.4.3). Here, specific interrupt operations such as interrupt 
clearing, IRV read/write, interrupt mask manipulation, etc., are 
described. 

CCIR CLEAR CURRENT INTERRUPT 

Allows nesting of user interrupts IRg-i on subsequent instructions 
by clearing both the interrupt latch bit currently being serviced 
and the interrupt in progress signal (IRIP), re-enabling mierrupts. 
If an external interrupt is pending, the associated IR vector will 
not be output until the cycle following CCIR. Internal interrupts 
(IR9 and IRo) are not cleared by CCIR and must be explicitly 
cleared through the SLRIVP and CLRS instructions, respec- 
tively. 

CAIR CLEAR ALL INTERRUPTS 

Clears external interrupt latches IRg- 1 , and re-enables the interrupt 
interface (IRIP cleared LO). The next sequential instruction 
will be executed prior to the jump to a pending interrupt. Internal 
interrupts (IR9 and IRo) are not cleared by CAIR and must be 
explicitly cleared through the SLRIVP and CLRS instructions, 
respectively. 

RTNIR RETURN FROM INTERRUPT 

Clears the current interrupt latch for IRg-ij re-enables interrupts 
(IRIP cleared LO), and pops the return address from the sub- 
routine stack. The next sequential instruction will be executed 
prior to the jump to a pending interrupt routine. Internal interrupts 
are not cleared and the IR 9 and IRo interrupt latches must be 
cleared explicitly through the SLRIVP and CLRS instructions, 
respectively. 

RDIV READ IRV AND INCREMENT I VP 

Outputs the interrupt vector currently pointed to by IVP to the 
data port and then increments the IVP. Interrupts should be 
disabled when writing or reading interrupt vectors. 

WRIV WRITE IRV AND INCREMENT IVP 

Writes the interrupt vector currently pointed to by the IVP 
from the data pon and then increments the IVP. Interrupts 
should be disabled when writing or reading interrupt vectors. 
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IRMBC 


IR MASK BITWISE CLEAR 


Allows selected IR mask bits to be cleared. Data port bits Di 5_6 
are applied to status register bits SRI 5-6 (corresponding to 
mask bits for IR 9 _o). Those data bits which are HI will clear 
the mask bit, while, those data bits which are LO will leave the 
mask bit intact. Data port bits Ds^o are ignored. 

IRMBS IR MASK BITWISE SET 

Allows selected IR mask bits to be set. Data port bits Di 5_6 are 
applied to status register bits SRi 5_6 (corresponding to mask 
bits for Those data bits which are HI will set the mask 

bit, while those data bits which are LO will leave the mask bit 
intact. Data port bits Ds^o are ignored. 

DISIR DISABLE INTERRUPTS 

Disables the execution of all further interrupts by clearing the 
enable interrupt flag (SR 2 ). External interrupts contimm to be 
latched. 

ENAIR ' ENABLE INTERRUPTS 

Enables execution of interrupts by setting the enable interrupt 

flag (SR2). 

SLIR SELECT LATCHED INTERRUPTS 

Places the interrupt request latches in the latched mode for 
interrupts (SRq LO). Interrupts are latched if they are 
valid at the appropriate clock edge. Interrupts IRg.s are latched 
at the positive going clock edge while IR4-1 are latched at the 
negative going clock edge. 

STIR SELECT TRANSPARENT INTERRUPTS 

Places the interrupt request latches in the transparent mode 
(SRo HI) for interrupts IRg-i* The interrupt request is only 
valid while the external interrupt inputs are high. Interrupts are 
still processed on the next cycle, so long as they meet the minimum 
interrupt setup specification. Note that selecting transparent 
interrupting will clear any pending interrupts stored in the 
interrupt latch. 

SLRIVP WRITE SLR WITH Ds-z, 

ANDIVPWITHD15-12 

Loads the 4-bit stack limit register (SLR) and the 4-bit interrupt 
vector pointer (IVP) from the data port. This instruction also 
clears the stack overflow interrupt request IR9. 

For stack overflow detection, the active 6-bit stack pointer 
(SSP, LSP or GSP) is compared to a 6-bit word comprised of 
the 4-bit SLR (MSBs) and the two LSBs determined by the 
instruction type, as follows: 

‘00’ for subroutine stack push (PSDSS); or, 

TT for register stack push (PSDRS). 

For example, if a stack limit of 36io and positioning of the IVP 
at IRV7 is desired, the value ‘OlllxxxxxxlOOlxx’ is provided at 
the data port. Note that the SLR and IVP cannot be read. 

The interrupt vector pointer (IVP) addresses the vector file for 
reading or writing interrupt vectors. To write interrupt vectors 
IRV 9 _o, the IVP must first be initialized by SLRIVP. The 
WRIV instruction (see above) is then used to write the interrupt 


vector pointed to by the IVP, which is then incremented 
automatically. 

2.6 Relative Address Width Controls 

The width control instructions allow reduction of microcode 
when Jump Data Relative and Jump Subroutine Relative in- 
structions need less than the full, 16-bit range. Use these in- 
structions to sign extend the 8, 12 or 16-bit wide jump data 
presented at the data port. The jump width may be selected 
the explicit instructions or by directly setting the status regij 
bits SR 5_4 as described below. Any of these three instructio] 
win reset the Instruction Hold Control mode (see Misc. Inst 
tions - IHC, 2.7). 

Note that selection of 8-bit width can be made with or with( 
IHC. For all relative jumps, the jump distance is the offset 

REL16 SELECT 16-BIT RELATIVE JUMPS 

Select the 16-bit relative jump. This adds Dis^o at the data 
to the PC to obtain the jump address. The status bits SRs.^ 
set to ‘00’. 

REL12 SELECT 1 2-BIT RELATIVE JUMPS 

Selects the jump data from Dn^o- The offset is sign-extend 
allowing relative jumps in the range +2047 to -2048. The 
status bits SR 5_4 are set to TP. 

REL8 SELECT 8-BIT RELATIVE JUMPS 

Selects the jump data from Dy.Q. The offset is sign-extende 
allowing relative jumps in the range +127 to — 128. The st 
bits SR 5_4 are set to ‘OT. 

2.7 Miscellaneous Instructions 

CONT CONTINUE 

Increment and output the next location in microcode memc 

without any other changes. Allows straight line microcode 
execution. 

IDLE DISABLE OUTPUTS AND JUMP PC 

Places the address port into the high-impedance state, inhil 
program counter (PC) increments. Useful in applications w 
multiple sequencers share a common microcode address bu 

This instruction causes the ADSP-1401 to behave as if the 
had stopped. The IDLE instruction may be latched intem< 
by using IHC, freeing microcode for use by another device 
Note that while idle, external interrupts will continue to b( 
registered and should therefore be masked or disabled. 

IHC ENABLE INSTRUCTION HOLD CONTI 

Sets SR 5^4 to TO’ and redefines the function of IRj to alio 
subsequent instruction to be held for repeated execution, regj 
of the instruction port. Use of the IHC mode requires that 
mask bit for IRi be set. See Instruction Hold Control, app 
4.1 for more details. 

While in the IHC mode, asserting IRi HI (prior to the sec 
half-cycle of any instruction) will hold that instruction anc 
disable all interrupts (although they continue to be latchet 
until IRi is brought LO again (again, prior to the second ha 


It is recommended that IRi be dedicated to control of the IHC 
mode (if needed). However, if it must also be used for subsequent 
interrupting, then the CAIR instruction should be executed 
before unmasking IRi (to clear the interrupt request resulting 
from use of IRj as the IHC control). 

Use of IHC is constrained to 8-bit relative addressing (see Relative 
Address Width Controls, 2.6) and clearing IHC is accomplished 
by executing any of the relative address width control instructions 
(changing status register bits SR 5 « 4 ). 

WCS WRITE CONTROL STORE 

Provides sequential addressing during microcode downloads to a 
RAM based microcode store. The instruction may be interpreted 
as: 

JUMP DATA; 

IF FLAG: DECREMENT Co AND CONTINUE UNTIL 
INTERRUPTED. 

Upon initiation of the WCS instruction, the sequencer outputs 
|he address found at the data port (that of the first instruction 
to be downloaded). The external flag is then used to gate sub- 
sequent sequential addressing for the download and decrementing 
of counter Cq. This action continues until an interrupt is detected 
(from either a Cq underflow, externally or the chip is RESET). 
Instructions at the instruction port are ignored during WCS, 
until the interrupt or reset occurs. 

The external flag allows synchronization of an external memory 
with the sequencer. FLAG should be asserted HI as each new 
fjLCode word is made available for writing to pcode memory. 

Notes on Using a Writeable Control Store : 

• If a counter interrupt is desired, counter Co must be in- 
itialized with two less than the length of microcode seg- 
ment to be downloaded. 

• If counter interrupting is to be used to exit the WCS 
mode, IRVo should be unmasked and initialized with the 
address of the instruction to be executed upon WCS com- 
pletion (see Interrupts, 1.4.3 for timing). 

• Since interrupting is used to exit the WCS mode, the last 
address downloaded is pushed onto the SS stack as an in- 
terrupt return address. However, because it is not actually 
a return address, the SS should be popped immediately 

^ by decrementing the SSP (DSSP) to clear it of this last 
address. 

® Since FLAG is used to gate the download, it should not 
become active until after the WCS instruction is executed. 

See application note “Writeable Control Store using the 
ADSP-140r’. 


3.0 SPECIFICATIONS 

This section describes the ADSP-1401’s performance parameters. 
The Specifications Table lists the device’s relevant electrical and 
switching characteristics, while Figure 7 presents the corres- 
ponding timing diagram. 
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Figure 8. Three-State Reference Levels 


ORDERING INFORMATION 


Part Number 

ADSP-1401JN 

ADSP-1401KN 

ADSP-1401JD 

ADSP-1401KD 

ADSP-1401SD 

ADSP-1401TD 

ADSP-1401SD/ + 

ADSP-1401TD/ + 

ADSP-1401SD/883B 

ADSP-1401TD/883B 


Temperature Range 

0to+70°C 
0to+70°C 
Oto +70°C 
0to+70°C 
-55°Clo +125°C 
-55°Cto + 125'=C 
-55°Cto + 125''C 
-55°Cto +125°C 
-55°Cto + 125°C 
-55°Cto +125°C 


Package 

48-Pm Plastic DIP 
48-Piii Plastic DIP 
48-Pin Ceramic DIP 
48-Pin Ceramic DIP 
48-Pin Ceramic DIP 
48-Pin Ceramic DIP 
48-Pin Ceramic DIP 
48-Pin Ceramic DIP 
48-Pin Ceramic DIP 
48-Pin Ceramic DIP 



In the case of the program sequencer, for an external load capaci- 
tance of 50pF and a measured slew rate of 0.6V/ns, the peak 
current will be about 30mA. Since there are 16 such drivers, the 
total peak current may approach 480mA! 


The internal grotmd and supply lines may undergo a large dis- 
turbance during this transition unless the ADSP-1401 is tied to 
a solid ground plane and good high frequency decoupling is 
used (0.1 |xF ceramic between GND and Vdd as close as possibl 
to the device). Otherwise, is it possible that internal data in the 
ADSP-1401 may be lost. 


4.5 Mnemonics and Opcodes 

Opcode bits “ii” select the relevant register (Rs-o) and/or counter 
(C 3 _o). Opcode bits “cc” select the condition to be applied: 

‘00’ UNCONDITIONAL 
‘Or NOT FLAG 
‘10’ FLAG 
‘11’ SIGN 

The SIGN condition is precluded from instructions prefixed 
with“^”. 


Mjaemonic Opcode (l6-o) Description 


Status Register Bit Assignments 


■“Bit# 

Function (Hl/LO) 



SR« 

IRo Mask Bit 

SR5.4 

Relative Jump Width Selection: 

*00* = 16-bit relative address width 

‘or “8-bit width 

‘lO’^IHC Mode (8-bit width) 

‘11* = 12-bit width 

SR3 

Select GSP/LSP 

SR2 

Enablc/Disable Interrupts 

SRi 

Set/Clcar Sign Bit 

SRo 

Select Transparcnt/Latchcd Interrupts 


Jump and Branch Instructions: 


JPCOF 

001 0101 

IF FLAG: JUMP PC (self) 

JPCNF 

011 0101 

IF NOT FLAG: JUMP PC 
(self) 

JTWO 

101 ccOl 

IF COND: JUMP PC-h2 (skip) 

JDA 

111 cell 

IF COND: JUMP DATA, 
ABSOLUTE 

JDR 

111 ccOl 

IF COND: JUMP DATA, 
RELATIVE 

JDI 

101 cclO 

IF COND: JUMP DATA, 
INDIRECT 

JDRST 

100 llii 

IF SIGN OF Q: JUMP DATA, 
Q< = Ri; ELSE, Q< = Ci~l 

*JRC 

110 cci i 

IF COND: JUMP Rj 

JRS 

no llii 

IF SIGN OF Q: JUMP Ri, 
Q<-Ci-1 

JSA 

111 ccOO 

IF COND: JUMP SUB, 
ABSOLUTE 

JSR 

111 cclO 

IF COND: JUMP SUB, 
RELATIVE 

RTN 

101 cell 

IF COND: RETURN FROM 
SUB 

^BRANCH 

100 cci i 

IF SIGN OF Q: JUMP Rj; 
ELSE, Q< = Q - 1 , IF COND; 
JUMP DATA 

ack Operations: 


Subroutine Stack 


PSDSS 

001 1110 

PUSH DATA ONTO SS 

PPSSD 

on 1110 

POP SS TO DATA PORT 

WRSSP 

000 1110 

WRITE SSP 

RDSSP 

010 1100 

READ SSP 

DSSP 

000 0010 

DECREMENT SSP 

Register Stack 


SGSP 

000 0111 

SELECT GSP 

SLSP 

000 0110 

SELECT LSP 

RDRSP 

010 nil 

READ RSP 

WRRSP 

000 1100 

WRITE RSP 

PSPC 

010 0011 

PUSH PC ONTO RS 

PSGSP 

000 0101 

PUSH GSP ONTO SS 

PPGSP 

000 0100 

POP GSP FROM SS 

PSDRS 

001 nil 

PUSH DATA ONTO RS 

PPRSD 

on nil 

POP RS TO DATA PORT 

AIRSP 

010 lOii 

ADD i TO RSP 

SIRSP 

000 nil 

SUBTRACT 1 FROM RSP 

S4RSP 

on 1100 

SUBTRACT 4 FROM RSP 


Status Register Operations: 


RDSR 

010 1110 

READ SR 

WRSR 

001 1100 

WRITE SR 

PSSR 

010 0001 

PUSH SR ONTO SS 

PPSR 

010 0010 

POP SR FROM SS 

Counter Operations: 


WRCNTR 

on lOii 

WRITE Q 

CLRS 

001 0100 

CLEAR SIGN BIT 

SETS 

on 0100 

SET SIGN BIT 

PSCNTR 

000 lOii 

PUSH Q ONTO SS 

PPCNTR 

001 lOii 

POP Q FROM SS 

DCCNTR 

on OOii 

DECREMENT Q 

IFCDEC 

101 ccOO 

IF COND: DECREMENT Q 

Interrupt Control: 


CCIR 

001 0001 

CLEAR CURRENT 
INTERRUPT 

CAIR 

000 0001 

CLEAR ALL INTERRUPTS 

RTNIR 

000 0011 

RETURN FROM 

INTERRUPT 

RDIV 

010 1101 

READ INTERRUPT VECTOR 
AND INCREMENT IVP 

WRIV 

000 1101 

WRITE INTERRUPT 
VECTOR AND INCREMENT 

rvp 

IRMBC 

001 0011 

IR MASK BITWISE CLEAR 

IRMBS 

001 0010 

IR MASK BITWISE SET 

DISIR 

001 0110 

DISABLE INTERRUPTS 

ENAIR 

on 0110 

ENABLE INTERRUPTS 

SLIR 

001 0111 

SELECT LATCHED 
INTERRUPTS 

STIR 

on 0111 

SELECT TRANSPARENT 
INTERRUPTS 

SLRIVP 

001 1101 

WRITE SLR< = D5„2 AND 
IVP< = Di 5-12 

Relative Address Width Controls: 

REL16 

010 0100 

SELECT 16-BIT RELATIVE 
ADDRESSING 

REL12 

010 0111 

SELECT 12-BIT RELATIVE 
ADDRESSING 

REL8 

010 0110 

SELECT 8-BIT RELATIVE 
ADDRESSING 

Miscellaneous Instructions: 


CONT 

000 0000 

CONTINUE 

IDLE 

001 0000 

IDLE 

IHC 

010 0101 

ENABLE INSTRUCTION 
HOLD CONTROL 

wr^ 

mn noon 

WRITE CONTROL STORE 
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ANALOG 

DEVICES 


Address Generate 




FEATURES 

16-Bit Addresses with Higher Precision Options 
High Speed, Clock-to-Valid-Address Delay of 20ns 
Look-Ahead™ Pipeline 
Versatile Addressing Hardware: 

30 16-Bit Registers 

16-Blt ALU with Left/Right Shift & Carry I/O 
Comparator 
Bit Reverser 
Dual Ports 

Powerful Single-Cycle Looping Instructions 
175mW Maximum Power Dissipation with 
CMOS Technology 
48-Pin DIP 







GENERAL INFORMATION 

The ADSP-1410 is a fast, flexible address generator optimized 
for digital signal/array processors and other high-performance 
computers. This low-power CMOS device rapidly generates the 
data memory addresses required by routines such as digital 
filters, FFTs, matrix operations, and DMAs. With its 16-bit 
architecture, registers, dual ports, and speed, the 48-pin ADSP- 
1410 improves performance and reduces board space substantially 
relative to bit-slice solutions. IV^ 


The ADSP-1410*s architecture features a* 16-bit ALU 
parator, and 30 16-bit ^registers. The registers 
four files: sixteen address (R) registers, six rcgi^^sC 

four compare (C) registers, and four initializat^ 

The ADSP-1410 rapidly executes key address genera tii^o^a- 
tions. In a single instruction cycle, the device can: 

• output a 16-bit memory address; 

• modify this memory address; and, 

• detect when the address value has moved to or beyond a 
pre-set boundary and conditionally loop back to the 

top of a circular buffer. 

Consequently, circular buffers and modulo addressing for data 
memories can be implemented without overhead. 

The ADSP-1410’s f§=bit mrcfoebdeinstr^ include com- 
mands for looping, register read/writes, internal data transfers, 
and logicaPshift operations. Instructions are normally supplied 
from an external source. However, an internal Alternate In- 
struction Register (AIR) can provide the instruction under external 
control, allowing microcode to be conserved in many 
applications. 


The ADSP-1410 has a 16-bit address Cf) port for outputting 
addresses and a 16^bit data (D) port Tor FO betw internal 
and external registeis. Also, an internal path allows external 
data, provided^S'^e D pon, to serve as an ALU source and 
to be dir^l^^^pu^ver the Y port for a DMA capability. 

Doi|.b]^pi^^ioh (30-bit), single-cycle addressing can be per- 
^9]^^^|b^casc^i% two ADSP-1410’s, with the MSB of eacl 
D l^b,^^c!?(licated to interchip communication. 
Imtemadi’^lj^^ingle AG can provide double-precision addres 
per two clock cycles. 

^m^Lool^j^l^™ pipeline eliminates the need for an exten 
wnucw register by intemaly latching instructions 

andM^eSes; microcode bits may be directly routed to the 
from microcode memory. Logically, the Look- 
%mead™ pipeline is spht into two havles: the first, located at 
the instruction (and data) port; and the second, located at th( 
address port. Each half of the pipeline (input vs. output) has 
transparent latch which operates out of phase with the other: 
the address latch is transparent dtiring the first half of the cy 
(clock HI); while the input latches (instruction and data) are 
transparent during the second half of the cycle (clock LO). 1 
complementary arrangement allows new instructions to be decc 
(in preparation for the following cycle) while the program add 
for the current cycle is held steady. 
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y3SP-1410 OVERVIEW 

Digital Signal Processing (DSP) and array processing systems 
require fast, flexible address generation circuitry. An Address 
Deiferator (AG) supplies the address of a location in data or 
:oefficient memory. The value residing at the specified address 
LS fetched and fed to an arithmetic unit for processing. The AG 
must then modify the address pointer in anticipation of the next 
data fetch. For algorithms that repetitively loop through data 
buffers, the AG may need to compare the address to a buffer 
end and conditionally loop back to the top of the buffer. Finally, 
to maximize throughput, an AG must perform its addressing 
tasks rapidly and without overhead. 

With the ADSP-1410, 16-bit pointers to memory are stored in 
an address (R) register file. Since an AG must track several 
pointers concurrently, sixteen R registers, denoted Rn, are pro- 
vided. If we denote Y as the address port, the operation “Y Rn’^ 

corresponds to the AG supplying an address from register Rq. 

After supplying an address, the AG must update the pointer for 
the^next memory fetch. The updating may be as simple as an 
increment but, more generally, involves adding or subtracting 
an arbitrary offset value. Also, algorithms generally access several 
different offset values. To this end, the AG provides six offset 


registers, denoted Bm? and can execute in a single-cycle the core 
operation: 

Y-*-R^;Rn-<-Rn + B^. 

In DSP applications, data arrays are often addressed as circular 
buffers. That is, when addressing reaches the buffer end, it 
wraps back to the beginning of the buffer. To implement this 
looping, the AG compares the supplied address to one of four 
compare registers, denoted Cj. If the address has moved to or 
beyond the end of the boundary (Rn^Cj), the device can 
transfer an initialization register value, denoted Ij, to the register 
Ij); otherwise, it is updated in normal fashion 
(Rn^<~ Rn + BnO* To mininiize overhead, the AG can execute 
normal updates while also performing conditional re-initializations; 
again, in one core operation: 

Y-^Rn; IF (Ra^Cj): Ij; ELSE Rn-^ Rn + B„,. 

Smce the above instruction handles the looping required of 
circular buffer addressing, it is termed a looping instruction. To 
a large extent, the ADSP-1410’s architecmre and instruction set 
revolve around efficient implementation of this instruction. 
However, many variations of this instruction are supported on 
the device and spelled out in the following sections. 


ADDRESS SOURCES 

- Sixteen internal R registers 

- External data provided over the D port 

OFFSET SOURCES 

- Six internal B registers 
~ Data Port 


OFFSET OPERATIONS 

- Increment 

- Decrement 

- Add Offset 

- Subtract Offset 

- Single-Bit Left/Right 
Shifts 

- Logical Operations 


(Rn^ 

(Rn- 

(Rn- 

(Rn- 


Rn+1) 
Rn-1) 
Rn + Bm) 
Rn “ Bjh) 


(AND,OR,XOR) 


CO]!|DITIONAL RE INITIALIZATION 


- Independent Inhibit/Enable for each of four 
initialization registers 

- Conditional AIR execution (used for true 
modulo addressing) 


OUTPUTAJPDATE SEQUENCE 

- Normal (Pre-Update) Mode (output the address 
before update) 

- Post-Update Mode (output the address after 
update) 


PRECISION 

- Single chip supplies 16-bit addresses 

- Two chips cascaded provide 30-bit addresses 

- One chip provides 30-bit addresses in two 
cycles 


ADSP-1410 PIN ASSIGNMENTS 

PIN NAME DESCRIPTION 

Yi 5 - Yo The address (Y) output port. In single-chip/double- 

precision mode, the MSB (Y 15 ) indicates whether 
the supplied address is the MSW or LSW (see 
Precision Modes). In two-chip/double-precision 
mode, the MSB conveys the carry/shift bit from 
the Least Significient (LS) to the Most Significient 
(MS) chip. 

Di 5 - Do The bi-directional data (D) port. In two-chip/dou- 
ble-precision addressing mode, the MSB (D 15 ) of 
this port conveys CMP status from the partner 
chip. 

I 9 - lo The instruction port. 

CMP/Z A dual function pin. Looping instructions, which 

compare address register values to compare 
register values, assert this pin HI to convey 
CMP status if i) R^C for positive offsets, or 
ii) R<C for negative offsets. Logical/Shift in- 
structions assert this pin HI to convey the ZERO 
status of the result. 


DSEL Data Select control. Asserting ,^s control HI 

causes data set up on the data port to substitute 
for the R value specified in the instruction. 

AIR Enable Alternate Instruction Register control. Asserting 
this control HI causes the device to execute an 
instruction stored in the internal AIR, rather 
than the instruction set up on the instruction 
port. 

CLK Clock 


Vdd + 5 Volt Power Supply 

GND Ground 



Dis-0 



Figure 1. ADSP-1410 Functional Block Diagram 















Latched Mode 

(CR^ LO). In latched mode, output values are enabled during 
phase one and latched at the address (Y) port during phase two. 

Use of the latched mode guarantees that outputs remain stable 
throughout the current cycle (barring Tad) regardless of changes 
at the instruction port. This, in contrast to the transparent 
mode, in which such changes may occur quickly enough to alter 
the output before cycle end. 

Post-Update Mode 

(CR9 HI). Addresses are output after the update operation. The 
delay between the start of phase one and output of a valid address 
is extended in this mode to allow for updating. The addresses 
output are equivalent to the values written back into the specified 
address (R) register. In this mode, external data may be brought 
^ on chip, modified and output — in a single clock cycle. 

Pre-Update Mode 

(CR9 LO). This is the normal update mode in which addresses 
are output over the address (Y) port prior to update operations 
(increment, decrement, offset, shift, and logical) — allowing 
addresses to be generated at maximal speed. Note however, that 
this mode requires two cycles to bring external data on chip, 
modify it, and supply it as an address. 

Conditional AIR Execute Mode 

(CRio HI). In this mode, a valid CMP flag on looping instructions 
causes the next instruction to be executed from the AIR. The 
MODULO ADDRESSING section highlights a particularly 
valuable use of this mode. 

Note that conditional re-initialization of address registers is 
disabled when using the conditional AIR execute mode, although 
routine updates (INC, DEC, ADD, and SUB) are still performed 
in accord with the instruction under execution (be it from the 
instruction port or the AIR). 

(CRio LO). Conditional AIR execution is disabled. Conditional 
^re-initialization is fully operational, contingent upon the re-in- 
itialization mask (CRs^o)* 

Table III summarizes the different ways the CMP status affects 
operation of the AG as a function of the conditional AIR execute 
mode control bit, CRios and the re-initialization mask, CRs.q. 


CMP 

STATUS 

CRioLO 

CRio HI 

CRjLO 

CR^HI 

LO 

No Effect 

No Effect 

No Effect 

HI 

CMP/Z goes 
HI 

CMP/Z goes 
HI; 

1 

CMP/Z goes 
HI; 

Nextinstr. 
executed from 
AIR 


Table III. Effect of Compare (CMP) Status for Looping 
Instructions; Note: j=3-0, the Re-Initialization Mask. 


INSTRUCTION SET DESCRIPTION 

The ADSP-1410’s instruction set is partitioned into six groups 
which are discussed below. First, however, issues spanning 
several instruction groups are discussed. 

Most of the instruction groups contain instructions using one c 
the chip’s six offset (B) registers. Without exception, these 
instructions have just two bits available for selecting the B regist 
Consequently, offset registers are partitioned into two banks. 
The upper/lower bank selection is maintained in the control 
register (CRg) and is set or cleared by dedicated instructions. 
Whenever the “fourth” B register of either bank is specified 
(B3 or B7), the ALU’s offset source becomes external data (se€ 
Table IV). 


CR8& TWO-BIT 1 
OFFSET (B) 
REGISTER 
FIELD 1 

OFFSET i 
SOURCE 1 

0 00 ! 

BO 1 

0 01 I 

B1 1 

0 10 I 

B2 ! 

X 11 

Data Port* j 

U.-, n - ^ 

1 00 

' 1 

‘ B4 1 

1 01 

! B5 1 

1 10 

! 1 

X 11 

1 Data Port'*' j 


Table IV. Offset Value Structure 


^Explicit use of DSEL is unnecessary when using B3 or B7 offsets; the ol 
data is sourced from the data bus by default. 

In several instruction groups, address (R) registers are used, 
all cases, asserting the DSEL pin allows external data to be 
substimted for an R value as both output and update data. 




Two instniction groups (looping and logical/shift) both supply 
and update the address. Normally, addresses are supplied prior 
t^ updating (pre-update). In post-update mode however, the 
addresses are output after the update operation is performed. 

CR9 controls this mode of operation. 

For all instructions accessing an offset register, the MS bit of 
the three-bit offset register address (B, of Bbb) is fetched from 
the control register and is programmed by the SELB instruction. 
This is also the case for the YADD and YSUB instructions 
(group 1) as pertains the MS bit of the four-bit address register 
address (R, of Rrrr), programmed by the SELR instruction. In 
both cases, it is incumbent upon the programmer to ensiure the 
appropriate register bank is selected. 

The Y port is only driven on output instructions (mnemonic 
form Yxxx, see MNEMONICS AND OPCODES). Otherwise, 
the Y port defaults to a high impedance state. 

Instruction Group 1: Looping 

instructions in the looping group supply the contents of a selected 
address (R) register to the address (Y) port and then overwrite 
the R location with an updated value. 

All instructions in this group generate an internal CMP status 
indicating whether the supplied address has moved to or beyond 
the boundary specified by the compare register. This status may 
be monitored externally via the CMP/Z pin. Internal to the 
chip, the CMP status can i) be ignored, ii) be used to control 
re-initialization of the R register value with a selected I register 
value (e.g., to restart an addressing loop), or iii) control execution 
of an instruction located in the AIR on the next cycle. Individual 
control register bits determine which option is enforced (see 
Control Register). 

YINC Output & Increment/Init. 


Pre-Update Mode: 


IF(R„>Cj): 

THEN 


ELSE 

R„+l. 

Post-Update Mode: 


IF(Y>Cj): 

THEN 


ELSE 

R^^R„+l. 


Output an address (R) register on the address (Y) port and 
compare it to one of the compare (C) registers. If the address is 
less than Cj, the R location is simply updated with an incremented 
value. However, if Rn^Cj , CMP status goes HI and the R 
register is re-initialized with the Ij value, provided the initialization 
mask (CR3 _o) is enabled for Ij. Note that other modes of operation 
allow CMP status to be ignored (e.g., the instruction executed is 
simply “Y.-<- Rnj Rn"^ Rn + 1 ”) or to cause the AIR instruction 
to execute on the next cycle. 


YDEC Output & Decrement/Init. 


Pre-Update Mode: 


IF(R,<Cj): 


THEN 


ELSE 


Post-Update Mode: 

Y 

IF(Y<Cj): 


THEN 


ELSE 

R,-e-R„-l. 

Same as above except the R value is decremented instead of 
incremented; CMP is valid if the R value is less than or equal to 

the C value. 


YADD Output & Add Offset/Init. 

Pre-Update Mode: 

Y^Kl 

IF(R„>Cj): 


THEN 


ELSE 

Rn^Rn + B„. 

Post-Update Mode: 

Y -«-R„+B„; 

IF(Y>Ci): 


THEN 


ELSE 


Same as YINC except the R value is summed with the contents 

of a selected offset (B) register. 


The R register bank select bit (CR7) is used in both the YADD 

and YSUB (offset) instructions. 
YSUB Output & Subtract 

Offset/Init. 

Pre-Update Mode: 

Y 

IF(R„>Cj): 


THEN 


ELSE 

Rn^Rn-B^. 

Post-Update Mode: 

Y -<-R„-B„; 

IFCYaCj): 


THEN 

Rn-«-Ii. 

ELSE 

Rn ^ 


Same as YADD except the selected offset (B) register is subtracted 
from the R value. 


Instruction Group 2: Register Transfers 

Instructions in the register transfer group support internal register 
transfers, as well as transfers between internal and external 
registers. Internally, any I or B register may be written directly 
to any R register. Also, any R register may simultaneously be 
output and written direcdy to a B or C register. For an R-to-R 
transfer, the source R register can first be written to a B register, 
followed by a write of the B register to an R register on the next 
cycle. 


Internal registers are read or written externally via the bi-directional 
data port. There are explicit instructions to read any of these 
registers; however, only the I registers have an explicit Write 
instruction. The R, B, and C registers may be written with 
external data by executing a transfer instruction (YRTR, YRTB, 
and YRTC) and asserting the DSEL pin, substituting the external 
data for the designated R value. 

YRTR Output & Transfer Addr. Reg. to Self 

Y-<«Rn 

Outputs selected address (R) register over the address (Y) port. 
When DSEL is asserted, data port values are output and, in the 
same cycle, written into the selected R register. 

YRTB Output & Transfer Addr. Reg. to Base Reg. 

Outputs selected R register over the Y port and copies it into a 
selected B register. When DSEL is asserted, data port values 
are output and, in the same cycle, written into the selected B 
register. 

YRTC Output & Transfer Addr. Reg. to Comp. Reg. 

Y-^R,;q 

Same as above, except that values are written to a C register. 

DTI Transfer Data Bus to Init. Reg. 

li-eD 

Loads selected I register from data (D) port. 

ITR Transfer Init. Reg. to Addr. Reg. 

Selected R register is loaded from an I register, allowing a 
microprogram to restart a loop at any time. 

BTR Transfer Base Reg. to Addr. Reg. 

Rn 

^ Loads an R register from a B register. Once in the R register, 
the B value may be modified and then returned to the B file 
(using a YRTB instruction). 

RTD Transfer Addr. Reg. to Data Bus 

D-^Rn 

Supplies selected R register to data (D) port. 

CTD Transfer Comp . Reg. to Data Bus 

D^Cj 

Supplies selected C register to data (D) port. 


ITD Transfer Init. Reg. to Data Bus 

Supplies selected I register to data (D) port. 

Instruction Group 3: Logical & Shift 
Instructions in the logical/shift group supply a value from a 
selected address (R) register to the address (Y) port and then 
unconditionally overwrite the selected R location with a modii 
version of the output. Modify operations include logical (AN! 
OR, and XOR) and shift (one-bit left/right) operations. All 
instructions in this group affect the ZERO flag, which goes I 
if the result of the modification is zero. The ZERO flag statU' 
available externally over the CMP/Z pin. 

YOR Output & Logical OR to Addr. Reg. 

Y-t- (R„ORBJ 

Selected R register is supplied to the address (Y) port; the specil 
R location is then overwritten with the logical OR of the B 
register and original R value. 

YAND Output & Logical AND to Addr. Reg. 

Y-^R,;R„-^(R„ANDBJ 

Same as above, except that a logical AND is performed. 

YXOR Output & Logical XOR to Addr. Reg. 

Y-^~R,;R,-<-(R,XORBJ 

Same as above, except that a logical XOR is performed. 

YASR Output & Arithmetic Right Shift to Addr. Reg. 

Y-<~R,;R,-^ASR(R,) 

Selected R register is supplied to the address (Y) port; the speci 
R location is then overwritten with the original R value arithn 
cally shifted right (ASR) by one bit (the MSB is repeated). 

YLSL Output & Logical Left Shift to Addr . Reg . 

Y-^R„;R„-<-LSL(R„) 

Selected R register is supplied to the address ( Y) port; the speci 
R location is then overwritten with the original R value logic 
shifted left (LSL) by one bit (the LSB is zero-filled). 

Instruction Group 4: Control Register 
Instructions in the control register group reset, read, and wr 
the entire control register or individual control register bits ( 
Control Register). 

Note the use of ‘V’ and “pp” to denote values supplied wit! 
the opcode field (see MNEMONICS AND OPCODES). A 
positive logic convention is used throughout. 


BTD T ransfer Base Reg. to Data Bus 

b -^B, 

Supplies selected B register to data (D) port. 


RST 


Reset Control Reg. 

CR-^-O 

C|ears the entire control register (CRio-o)- The RST instruction 
has dedicated decoding logic so that it takes precedence even 
over the second instruction of a conditional AIR sequence. 

DTCR Transfer Data Bus to Control Reg. 

CR^D 

Writes the entire control register (CRio_o) from the data port, 
Dio_o* 

CRTD T ransfer Control Reg. to Data Bus 

D-^CR 

Outputs the entire control register (CRio-o) over the data port, 
Dio_o* 

SETI Set/Clear Conditional Init. on CMP Flag 

Enables conditional re-initialization of an R location, subject to 
CMP status (see Control Register). This instruction loads the x 
value into the control register bit specified by jj. Conditional re- 
initialization of address registers by the Cjj/Ijj pair is inhibited if 
the corresponding CRjj is cleared. 

SETP Set Chip precision 

CRs-4-^ PP 

Loads a 2-bit code (pp) into control register bits 5 and 4, specifying 
the addressing mode of the device: 

00 = single-precision mode; 

01 == double-precision mode, LS chip; 

10 = double-precision mode, MS chip; 

11 = double-precision mode, single-chip. 

If the instruction “SETP, 01” is supplied and the chip’s DSEL 
pin is asserted, the CR 5_4 bits are reversed, i.e., the relevant 
chip is loaded with “10”, not “01” (see Precision Modes). This 
is useful if the MS and LS chips share a common instruction 
bus. 

SEXY Set Y Port to Transparent/Latched Mode 

^ CR^-^-x 

Uses the LS instruction bit to set the address (Y) port to the 
transparent (HI) or latched (LO) mode. This status is maintained 
in control register bit 6. 

SELR Select Upper/Lower Addr. Reg. Bank 

CR7 X 

The LS bit of this instruction provides the missing Address (R) 
register select bit required by the YADD and YSUB instructions. 
This selection is maintained in control register bit 7. 

SELB Select Upper/Lower Base Reg. Bank 

CRg X 

The LS bit of this instruction provides the missing B register 
select bit required by all instructions utilizing offset (B) registers. 
This selection is maintained in control register bit 8. 

SETU Set Update Mode (Post/Pre) 

CR9-^ X 


Setting this bit causes the chip to output address values after 
updating them (post-update mode). The LS bit of this instruction 
determines the value of control register bit 9. 

SETA Set/Clear Conditional AIR Execute Mode 

CRio-^ — X 

Setting this bit causes Looping instructions — conditional on 
CMP status being HI — to execute the following instruction from 
the AIR on the next cycle. In this mode, conditional re-initialization 
of R by I on CMP is inhibited. The LS bit of this instruction 
determines the value of control register bit 10. 

Instruction Group 5: AIR Control 
Instructions in the AIR group write and read the Alternate 
Instruction Register (AIR). The AIR may be written or read 
over the data bus in one cycle or written via the instruction port 
in two cycles (see Table I). The instruction contained in the 
AIR is executed whenever the AIR Enable pin is asserted or on 
the next cycle in the conditional AIR execute mode. 

WRA Write AIR with Data Bus 

AIR-^ D 

Write the AIR from the data (D) bus (D 9 „o)' 

RDA Read AIR at Data Bus 

D-^ AIR 

Read the AIR over the data (D) bus (D 9 ^o)* 

LDA Load AIR from Instruction Port on Next Cycle 

(Requires DSEL HI during load) 

AIR -4- Instruction Port 

This instruction is the first of a two-cycle sequence that loads 
the AIR via the instruction port. On the cycle following its 
execution, the instruction appearing on the instruction port will 
be loaded into the AIR — ^provided that the DSEL pin is asserted. 
If DSEL is not asserted during the cycle following LDA, the 
AIR is not loaded and a NOP is executed, superseding the 
instruction at the instruction port. 

Instruction Group 6: Miscellaneous 

DTY Pass Data Bus to Y Port 

Y-^D 

Data (D) port values are supplied directly to the address (Y) 
pon. Note that internal address (R) registers are not affected by 
this instruction. 

YREV Output Addr. Reg. in Bit-Reversed Format 

The selected address (R) register is bit reversed at the output 
port. The original (unreversed) R value is added to the selected 
offset (B) register, and written back into the specified R location. 
Condition testing is not performed. Bit reversing affects only 
output data, not register contents. 

NOP No Operation 

Prevents any changes to the internal conditions of the AG. All 
I/O pins go to the three-state disable mode. 
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Figure 9. PL US Processing Environmental Flow 

MNEMONICS AND OPCODES 

The following list gives the instruction mnemonics and opcodes. 
Various parameters are substituted by the user, defining register 
numbers or control bits. The notation convention is this: 


R 


Address register 

4 3 


Base (offset) register 

C 

= 

Compare register 

I 

= 

Initialization register 

D 

— 

Data bus : 

CR 

- 

Control register 

rrrr 

= 

Four-bit address register number 

rrr 

= 

Three-bit address register number 

bb 

=- 

Two-bit base (offset) register number 

cc 


Two-bit comparison register number 

ii 

= 

Two-bit initialization register number 

PP 

= 

Two-bit precision code 

X 


One-bit control bit 

^External data may substitute for R using DSEL. 


fOpcrable in cither pre- or ix)st-updatc mode. 


Figures. Typical Ipo^s. 

Frequency of Operation 

Instr. 

Opcode (l 9 _o) 

Description 

Looping Instructions^t 


YINC: 

101 Iccrr rr 

output & increment/init 

YDEC: 

lOlOccr r rr 

output & decrement/init 

YADD: 

1 Iccbblr rr 

output & add offset/init 

YSUB: 

1 IccbbOr r r 

output & subtract offset/init 

Register Transfer Instructions 


YRTR*: 

OOOlOlrrrr 

output &xfrR to R 

YRTB*: 

OOllbbrrrr 

output &xfrR to B 

YRTC*: 

OOlOccrrrr 

output &xfrR to C 

DTI: 

OOOOllIlii 

xfr D to I 

ITR: 

lOOOi i rrr r 

xfr I to R 

BTR: 

0 1 OObbr r r r 

xfrBtoR 

RTD: 

OOOlOOrrr r 

xfrRtoD 

CTD: 

OOOOllOOcc 

xfrCtoD 

BTD: 

00001 lOlbb 

xf r B to D 

ITD; 

00001 1 lOi i 

xfr I to D 

Logical and Shift Instructions*! 


YOR: 

0111 bbr rrr 

output & OR B with/to R 

YAND: 

0 1 1 Obbr rrr 

output & AND B with/to R 

YXOR: 

0101 bbr rrr 

output & XOR B with/to R 

YASR: 

00011 Ir rrr 

output & arith SR R to R 

YLSL: 

OOOllOr rrr 

output & logical SL R to R 

Control Register Instructions 


RST: 

0000000001 

reset CR 

DTCR: 

0000101110 

xfr D to CR 

CRTD: 

0000101111 

xfrCRtoD 

SETI: 

OOOOlOOi i X 

set cond re-init on CMP mode 

SETP: 

OOOOlOlOpp 

set chip precision 

SETY: 

OOOOOlOOlx 

set Y port to trans/latched mo< 

SELR: 

000001 lOlx 

select upper/lower R bank 

SELB: 

OOOOOllOOx 

select upper Aower B bank 

SETU: 

00000101 lx 

set post/pre update mode 

SETA: 

OOOOOlOlOx 

set cond AIR mode 

AIR Instructions 


WRA: 

0000101100 

write AIR with D 

RDA: 

0000101101 

read AIR at D 

LDA: 

0000011110 

load AIR on next cycle 

Misc. Instructions 


YDTY: 

0000011111 

pass D to Y port 

YREV*t: lOOlbbrrrr 

output R in bit-reverse formi 

NOP: 

0000000000 

no operation 













ANALOG 

DEVICES 



16 X 16-Bit CMOS 
Single Port Multiplier/Accumulator 


FEATURES 

16 X 16-Bil Parallel Multiplication/Accumulation 
40"Bit Wide Accumulator with Overflow Flag, Satura- 
tion Arithmetic, and Shift-Left Control 
Twos Complement or Unsigned Magnitude Inputs 
85ns Multiply/Accumulate Time 
28-Lead Ceramic DIP, Plastic DIP Package, Plastic 
Leaded Chip Carrier, or Leadless Chip Carrier 
350mW Power Dissipation with CMOS Technology 
Specified Over the Extended Temperature Range 
Pin-Compatible with ADSP-1110 

APPLICATIONS 
Digital Filtering 
Fast Fourier Transforms 
Matrix Multiplication 
Microprocessor Acceleration 



WORD-SLICE'^'^ MICROCODED SYSTEM WITH AOSP-1110A 


GENERAL INFORMATION 

The ADSP-lllOA is a high-speed, low-power single-port 16 x 16- 
bit multiplier/accumulator (MAC), with processing throughput 
comparable to existing three-port MACs. Its single-bus structure 
offers unique advantages: more compact packaging in a 28-pin 
package, simpler system interface to single-bus peripherals, and 
significantly reduced cost. In addition, innovative on-chip features 
extend the ADSP-lllOA’s capabilities and eliminate external 
hardware. 

All inputs to and outputs from the ADSP-lllOA pass through 
its single 16-bit I/O port. All I/O operations are single cycle. A 
multiplication or MAC operation requires two cycles to complete — 
consistent with the two cycles required to load input pairs to the 
multiplier. An internal pipeline register enables a new input to 
be loaded as the previous multiplication/accumulation is com- 
puted — allowing the device’s full 11.7MHz computational 
bandwidth to be utilized. 

A six-bit microcode instruction word governs the ADSP-lllOA’s 
operation. The instruction set centers around I/O and multipli- 
cation/accumulation operations. Additional instructions allow 
extra precision in single- and double-precision operations to be 
obtained efficiently. 

Multiplier products are accumulated in a 40-bit wide Multiplier 
Result (MR) register, which consists of a 16-bit MS (Most Sig- 
nificant) and LS (Least Significant) register, and an 8-bit EX 
(Extension) register. Either multiplier input can be a twos com- 
plement or unsigned magnitude number. Overflow from the 
lower 32 bits of the MR into the upper eight guard bits is detected 
and can be monitored externally. Outputs can, conditional upon 
overflow status, be saturated to full scale. An MR register can 
be shifted left by one bit upon output; two independent controls 
allow rounding consistent with output formatting. 


The ADSP-lllOA is optimal for applications where board space 
is limited but the performance of a DSP processor is required. 
In addition, a microprocessor-based system can realize greater 
throughput by utilizing the ADSP-lllOA in an accelerator. 



ADSP-11 WA Functional Block Diagram 
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MJb I HUD OF OPERATION 

The ADSP-lllOA’s operation is controlled by a six-bit microcode 
instruction and two rounding control pins. Table III presents 
instructions that are executed by the ADSP-lllOA, along with 
tiie corresponding six-bit microcode instruction. The sections 
below further describe the instruction groups presented in 
Table III. 

Input and Multi-Operation Instructions 
A dedicated input instruction (“X = Bus”) loads the X input at 
the rising edge of the clock. The X input is loaded with the 
data that is set up on the device’s 16-bit I/O port. 

A set of multi-operation instructions (“Y = BUS; CKMR; X*Y”) 
are used to load the Y input and otherwise control the ADSP- 
lllOA’s multiplier/accumulator. Specifically, at the next rising 
clock edge, a multi-operation instruction i) loads Y input ii); 
clocks the result of the previous multiplicadon/MAC operation 
into the MR; and, iii) initiates the next multiplication/MAC 
operation. The multiplication/MAC operation is initiated at the 
rising edge of the clock and requires two cycles to complete, 
l^^ie instruction controls needed to govern the device’s multiplier 
array and 40-bit adder during these two cycles are registered 
internally. 

During the first cycle of a multi-operation instruction, the X 
input is transferred to an internal pipeline register (XD), and is 
latched there on the next rising clock edge. Consequently, a 
new X value can be loaded onto the chip during the second 
cycle of the multi-operation instruction. XD will not be overwritten 
until a new X value is loaded. 

The ADSP-lllOA supports the following multiplication and 
multiplication/ accumulation operations : 


Dedicated instructions allow any of the MR’s registers to be 
preloaded with data set up on the device’s 16-bii I/O port. This 
preloading occurs at the rising edge of the clock. 

The proper sequence for preloading a value Z into MR and 
adding it to the product Xi Yi is: 


Comment 
Load X 1 

Load Y I ; clock garbage into 
MR; initiate MAC 
Preload MR with Z 
Preload MR with Z 
Preload MR with Z 
Load X 2 

Load Y 2 ;MR-Xi*Yi+Z; 
initiate next multiplication. 

This sequence ensures that the value Z preioaded by instructions 
3, 4, and 5 is added to the product Xi*Yi and clocked into MR 
by instruction 7. If Z were preloaded prior to instruction 2, 
then instruction 2’s “CKMR” operation would overwrite the Z 
value with the product of whatever values were last placed in 
the multiplier array. 

Transfer operations allow one MR register to be moved down to 
an adjacent one — useful in double-precision operations. The 
ADSP-llIOA can, in one cycle, shift the EX to the MS or the 
MS to the LS register. The shift left extend register (SLE) is a 
one-bit latch that is loaded with the value of the MSB of the LS 
register whenever the MS is transferred to LS. The SLE register 
retains its value until the next downshift of MS into LS overwrites 
its contents. 


Instruction 

1. X = BUS 

2. Y = BUS;CKMR;X*Y-hMR 

3. LS-BUS 

4. MS = BUS 

5. EX = BUS 

6. X = BUS 

7. Y = BUS;CKMR;X*Y+MR 


± X’^Y 

and, 

± X^Y ± MR 

The ADSP-lllOA allows either input to be specified as a twos 
complement or unsigned magnitude number. Table II describes, 
for all combinations of inputs, the proper interpretation of the 
MR register if it is output with or without the left-shift option. 
Note that if the Y input is negative full scale and a negative 
product is specified, an invalid result is obtained. This happens 
because the ADSP-lllOA will attempt to produce the unrepre- 
sentable twos complement of full-scale negative. 

The result of a multiplication or MAC operation is latched into 
the MR register in either of two ways. A dedicated “CKMR” 
instruction performs this clocking. In addition, all multi-operation 
instructions clock the MR, eliminating overhead when computing 
MAC’S (see Instruction Sequences). It is important to note that 
whenever “CKMR” is executed, it clocks the result of the previous 
operation into the MR. Also, in all cases, the clocking of the 
MR occurs at the rising edge of the clock. 

MR Register Instructions 

A number of the ADSP-lllOA’s instructions affect the contents 
of the MR register — including preload instructions ^ transfer in- 
structions, and sign extend instnictions. In addition, special output 
instructions allow for format adjusting the MR upon output. 

The 40-bit accumulator of the ADSP-lllOA is segmented into 
three registers: a 16-bit most significant product register (MS); a 
16-bit least significant product register (LS); and, an 8-bit extended 
product register (EX) (see Table II). The eight guard bits of the 
EX allow at least 256 muitiplication/accumulations without risk 
of overflow. 


Anytime the result of a multiplication or multiplication/accumu- 
lation operation is clocked into the accumulator, the result is 
automatically sign extended into the upper MSBs of the ac- 
cumulator. In addition, explicit instructions allow the MSB of 
the LS to be sign extended to the MS (“MS = SIGN EXT 
LS”) or the MSB of the MS to be sign extended to the EX 
register (“EX = SIGN EXT MS”). Such sign extend capability 
may be needed to properly initialize the MR after the MS or LS 
is preloaded, or after an MR register transfer. 

Output Instructions 

Output instructions allow any MR register to be read. When 
written onto the ADSP-lllOA’s 16-bit bus, the 8-bit EX register 
is automatically sign-extended into the upper 8 MSBs of the 
bus. Standard output instructions of the ADSP-lllOA are sup- 
plemented with two important options: a shift-left capability 
and conditional saturation. 

The ADSP-lllOA’s output instructions include the ability to 
shift any MR register (EX, MS, or LS) left by one bit upon 
output. This shift does not affect the contents of MR, but does 
affect what appears on the ADSP-lllOA’s 16-bit I/O port. Figure 
6 shows which bits of the 40-bit wide MR register are output if 
the shift-left option is invoked. 

WITHOUT SHIFT-LEFT 



Figure 6. Effect of Left Shift on MR Outputs 




The shift left-on-output control, which scales up the MR outputs 
by a factor of two, is useful under many circumstances. Twos 
complement multiplication — for all but one case (negative full-scale 
times negative full-scale) — results in redundancy in the two 
MSBs of the 32-bit product. Thus redundancy means that the 
16-bit MS register contains two identical sign bits (bits 31 and 
30 of MR) and just 14 bits of magnitude. The ADSP-lllOA’s 
shift-left control allows full precision in twos complement opera- 
tions to be attained. Left shift control also provides a means for 
maxi m i zin g resolution when using block floating point, when 
downscaling twos complement results, and when upscaling 
mixed and unsigned magnitude results. 

Whenever the RND14 pin is asserted during a “BUS = LS 
(si)” or “BUS = LS (si, sat)” instruction, the SLE bit will be 
appended to the upper 15 bits of the shifted LS. If the RND14 
is low, however, a zero will be inserted into the LSB of LS. 
Appending the SLE bit to the shifted LS provides an extra bit 
of precision in applications such as double-precision multiplication/ 
accumulations. 


Round Controls 

The RND14 and RND15 pins are two independent controls 
that allow roimding consistent with shifted or unshifted outputs, 
respectively. The rotmd control signals are latched at the rising 
clock edge whenever the device receives a multiplication or 
MAC instruction. Asserting the RND 15 (RND14) pin will 
cause a 1 to be added to bit 15 (bit 14) of the LS. The rounding 
will not occur until the subsequent cycle in which the result of 
the multiplication or MAC operation is clocked into the MR. 


Overflow and Saturation 

The ADSP-lllOA’s overflow flag monitors 9 bits (8 in the EX 
register and the MSB of the MS register). If any bits in EX 
differ from the MSB of MS, then an overflow has occurred 
from the MS into the EX, and the overflow flag is asserted (HI) 
following an output instruction. Generally, the status of the 
overflow flag reflects the current contents of the MR and is 
updated each time a new result is clocked into the MR. However, 
if the MS register is output with a left-shift, the overflow logic 

I Inputs (X & Y) ex (s bits) 

j bi5 bi4 bo b39 bag b32 


I TWOS COMPLEMENT INTEGER 
—‘2^® 2^^ ------ 2® 

i TWOS COMPLEMENT FRACTIONAL 
2*^ 2'^® 


(w/osi) 

- 2^® 2^® 
(w/ si) 

_ 23® 2^’ 

{w/o si) 

- 2 ® 2 ® 
(w/sl) 

-2® 2^ 


r UNSIGNED MAGNITUDE INTEGER 

i"2i5 2° '2®® 

■ yNSIGNED MAGNITUDE FRACTIONAL 
2’^ ------ 2*^® 2^ 

’ MIXED MODE INTEGER 

-2^® 2’^ 2°& 

215 2^'* 2° — 2®® 2®® 

; MIXED MODE FRACTIONAL 

-2® - 2-''®& 

2'^ 2*^ ------ 2*^® 


232 

231 

2^ 


2^ 


232 

2 ® 


232 




determines whether the shifted MS overflows into the EX (when 
bits 38 through 30 are not identical) and is set accordingly. On 
the cycle following any left-shifted output, the overflow flag 
status reverts to reflect the contents of the MR. During cycles 
when a non-output instruction is executed, the overflow flag is 
always LO 

Serious data glitches can result from wraparound effects due to 
overflow in long multiply/accumulate chains. For example, if a 
positive number is added to positive full-scale, the 32 MSBs of 
the MR register will overflow into the 8-bit EX register. Simply 
reading the MS register will yield a negative twos complement 
number. To prevent this wraparound, the ADSP-11 lOA can — con- 
ditioned on overflow status — saturate an output to twos-comple- 
ment full scale. 

The ADSP-11 lOA’s saturation logic operates only on. output 
values; it has no effect on the contents of the MR register. This 
logic examines the sign of the MR (bit 39, the MSB of the EX 
register) and the overflow status. As lable IV indicates, the low 
32 bits of the MR are saturated to full-scale positive (negative) if 
overflow has occurred in a positive (negative) MR. 

Either the MS or LS registers can be left-shifted on output with 
conditional saturation. If the shifted value overflows the lower 
32 bits, the outputted result will be saturated to full scale. 

While the saturation control protects against overflow from the 
MS to the EX register, the user is not protected in the event the 
accumulated result overflows the entire 40-bit MR register. 


MR BIT 39 Output Value with Saturation 

OVF (SIGN BIT) MS LS 


0 

0 

1 

1 


0 M No Change >- 

1 , ^ No Change ^ 

0 ;0111111111111111 llllllllllllllll 

1 1000000000000000 0000000000000000 


Table I. Overflow and Saturation Circuitry Conditions 


MR 

MS (16 BITS) 


^31 ^30 bi6 

£31 2'*® 

230 2^5 

2^ 2° 2’** 

2 ® 2 '^ ------ 2 *^® 

2 ^^ ------ 2 ^® 

2 ^ 2 '® 

£31 2^® 

0 2'''5 


LS {16 BITS) 


bi5 b, bo 

215 2i 2® 

2 '* 2® SLE/O 

a’® 2 -^ 2"® 

2-'® 2*^ SLE/O 

2"'® ------ 2^ 2® 

2-17 2'^'’ 2'“ 

2i® ------ 2 ' 2® 

2*^® ------ 2 2'^^ 


-2* 


2' 


2' 


2‘ 



■ 

Instructioii 

Group 

Instrucdoii 

Microcode 

Instruction 

5 4 3 2 1 0 

Comments 

Miscellaneous 

NOP 

0 0 0 0 X X 

No Operation 


CKMR : 

0 0 0 1 X X 

Clock MR 

Input 

X = BUS 

0 0 1 0 X X 


Preload 

LS = BUS 

0 1 0 0 0 0 



MS = BUS 

0 10 1x0 



EX = BUS 

0 10 0 10 


Transfer 

LS = MS 

0 1 0 0 0 1 

Sets SLE register 


MS = EX 

0 10 10 1 


Sign Extend 

EX = SIGN EXT MS 

0 10 0 11 



MS = SIGN EXT LS 

0 10 111 


Output 

BUS = EX 

0 0 110 1 

All output instructions are asynchronous 


BUS = EX (si) 

0 0 110 0 

I5~I2: 


BUS = MS 

0 110 0 1 

0011 = EX 

'-f' 

BUS = MS (si) 

0 110 0 0 

0110 = MS 


BUS = MS (sat) 

0 110 11 

0111 = LS 


BUS = MS (si, sat) 

0 110 10 

Il-IO: 


BUS = LS 

0 1110 1 

01 = to bus 


BUS = LS(sl) 

0 1110 0 

00 = to bus shifted 


BUS = LS(sat) 

0 11111 

10 = to bus shifted w/saturation 


BUS = LS(si,sat) 

0 11110 

1 1 = to bus w/saturation 

Multi-Operation 

i Y = BUS; CKMR; Xus*Yus 

10 0x00 

Require two cycles to complete. 


|Y = BUS; CKMR; -Xus*Yus 

! 10 0x01 

Other instructions can be executed 


|Y = BUS; CKMR; Xus*Yus +MR 

100010 

on the second cycle. 


1 Y = BUS; CKMR, - Xus*Yus + MR 

100011 



Iy-BUS;CKMR; Xus*Yus -MR 

10 0 110 

15 = Multiplication/MAC operation 


Y = BUS; CKMR; - Xus* Yus -MR 

10 0 111 

j 14 = Y twos complement 


i Y = BUS; CKMR; Xtc*Yus 

10 1x00 

13 = X twos complement 


|Y = BUS;CKMR; -Xtc*Yus 

101x01 

f 12 = Subtract previous result 


Y = BUS; CKMR; Xtc*Yus +MR 

10 10 10 

i 11 = Add/subtract previous result 


|y = BUS;CKMR; -Xtc*Yus +MR 

10 10 11 

1 from product 


|Y = BUS; CKMR; Xtc*Yus -MR 

10 1110 

1 10 = Negate product 


Y = BUS; CKMR; -Xtc* Y us -MR 

10 1111 

1 


1 Y = BUS; CKMR; Xus*Ytc 

110x00 

s 


Y = BUS; CKMR; -Xus*Ytc 

110x01 

i 


Y = BUS; CKMR; Xus*Ytc +MR 

110 0 10 

1 


Y = BUS; CKMR; -Xus*Ytc +MR 

110011 



Y = BUS; CKMR; Xus*Ytc -MR 

110 110 

1 


IY = BUS; CKMR; -Xus*Ytc -MR 

110 111 

j 


1 Y = BUS; CKMR; Xtc*Ytc 

111x00 

j 


Y = BUS; CKMR; -Xtc*Ytc 

111x01 

I 

i 

j 

Y = BUS; CKMR; Xtc*Ytc +MR 

1110 10 



Y = BUS; CKMR; -Xtc*Ytc +MR 

1110 11 



Y = BUS; CKMR; Xtc*Ytc -MR 

111110 



Y = BUS; CKMR; -Xtc*Ytc -MR 

111111 



Mnemonic 

Definitions 



= 

Assign right side to left. 

si 

Shift left. 

BUS 

16-bit external data bus used for all I/O operations. 

sat 

Conditional on overflow, saturate the outputted value. 

X 

Input register for multiplier. 

TC 

Two’s complement number. 

Y 

Input register for multiplier. 

US 

Unsigned magnitude number. 

EX 

8-bit extension register for accumulator. 

SIGN 

Sign bit (MSB) of specified register. 

MS 

16-bit most significant product register. 

CKMR 

Clock product into EX, MS,andLS. 

LS 

16-bit least significant product register. 

* 

Multiply 

MR 

40-bit accumulator comprising EX, MS and LS. 

X 

Microcode instruction bit can be either a 0 or 1 . 


Table III. ADSP- 1110A Instruction Set 



CLOCK AND TIMING 

Figure 1 presents a timing diagram for the ADSP-lllOA’s 
operation. 

Input data, round controls, and non-output instructions are 
clocked (synchronous); set-up and hold times are specified ac- 
cordingly. All multi-operation (two cycle) instructions are clocked, 
and the internal controls needed for the second cycle are latched 
internally. 

Unlike all other ADSP-lllOA instructions, output operations 
are asynchronous. The relevant timing specification is the delay 
between control inputs and valid outputs. The use of saturation 
(sat) slows down the availability of a valid output on the ADSP- 
lllOA’s I/O bus; delay times are specified accordingly. 

The ADSP-lllOA’s OVF (overflow) flag is set according to the 
contents of the MR register. However, upon outputting the MR 
with the shift-left control, the OVF flag may be modified if the 
left shift causes overflow. The relevant timing for this case is 
specified. 

The ADSP-lllOA’s output three-state drivers are not disabled 
^Dis ns after an output instruction is removed. Since the 
ADSP-lllOA has just one I/O port, bus contention can occur 
when an ADSP-lllOA input immediately follows an output. 

For example, an input source (e.g., a data RAM) enabled to 
drive the bus immediately after an ADSP-lllOA output creates 
the possibility that both drivers are active simultaneously. There 
are two ways to avoid such conflicts: 

1 . Set up output instructions well in advance of the clock’s 
rising edge (>tD set-up time), enabling the data output to 
complete in time for the data to be latched at the clock edge. 
Allow tois ns after the clock edge before enabling a different 
device to drive the bus. Note that any system that provides 
the ADSP-lllOA with its instruction from a pipeline register 


operates in this way. The Hardware Implementations with the 
ADSP-lllOA section describes several alternative implemen- 
tations consistent with this approach. 

2. For systems with minimal instruction set-up time, an operation 
that doesn’t use the bus (e.g., a NOP) may need to be inserted 
after an output instruction. The reason for this is as follows. 
Output instructions must be held valid for to ns, which 
means that — if instruction set-up time is minimal — output 
instructions must be held beyond the rising edge of the clock. 
After the output instruction is removed, another tois elapses 
before the output drivers are disabled. As a result, the three- 
state output drivers are active well into the next cycle. If the 
bus is driven with an input in the next cycle, bus contention 
may occur. 

Instruction Sequences 

With the ADSP-lllOA, single multiplication operations involve 
three overhead statements in addition to the multiply command, 
as Figure 7 illustrates. 

While a multiplication/accumulation sequence is structurally 
similar to a single multiplication, overhead as a percentage of 
computation time is reduced substantially. In the instruction 
flow diagram shown in Figure 8, a NOP is needed only in the 
final multiplication/accumulation operadoii. Also, new X values 
are loaded as multiplication/accumulation instructions complete. 
In this sequence, the three cycles of overhead can be spread out 
over as many multiplication/accumulations as are performed 
consecutively. 

For a series of multiplication/accumulation sequences, I/O oper- 
ations can be further overlapped. At the end of each multiplication/ 
accumulation string, a new string is initiated. In this instance, 
overhead cycles become negligible in importance; the multiplica- 
lion/accumulation rate of the ADSP-11 lOA approaches llMHz. 



i Y = BUS; CKMR I 1 1 

I X*Y 111 


INSTRUCTION X X 



Fiaure 7. Multiolv Operation Timinp 




Figure 8. Multiply/Accumulate Operation Timing 


Avoid Bus Contentions 

Because the ADSP-lllOA typically shares its data port with 
other devices on a common bus, there is a potential for bus 
contentions' at power-up. If the instruction applied to the ADSP- 
lllOA at power-up is random, the multiplier/accumulator could 
be in an output state. If any other devices are driving the bus at 
the same time, there will be a bus contention. 

The obvious solution is to make sure no other devices are driving 
the common data bus at power-up. Another approach is to force 
instruction bit I 5 (pin 18) HI at power-up. This guarantees that 
the ADSP-lllOA will not be in an output state because ADSP- 
lllOA output instructions are asynchronous and ail have a zero 
in instruction bit 5 (I 5 ). 

HARDWARE IMPLEMENTATONS WITH THE 
ADSP-lllOA 

There are many alternative ways of implementing high perform- 
ance DSP systems wi± the ADSP-lllOA. The following sections 
illustrate some of the more commonly used approaches using 
the ADSP-lllOA: a microcoded system, a ROM-based sequential 
machine, a PLA-based state machine, and as a device direcdy 
interfaced to a microprocessor. The optimal implementation will 
depend on the performance, price, and board area requirements 
of the design. 

Microcoded System 

Many microcoded systems have the design objective of fast 
number crunching, while minimizing microcode bits and circuit 
board area. The ADSP-lllOA single port MAC — with just 8 
control bits, its single bus structure, and fast cycle time — helps 
meet these objectives. The ADSP-11 lOA can be simply connected 
to the processor data bus and microcode instruction field to 
provide powerful multiplier/accumulator functions. 


A typical Word-Slice™ processor with the ADSP-lllOA is 
shown below: 



Figure 9. 


In most bit-slice designs, the control bits from the microcode 
memory are latched in a pipeline register. In the above im- 
plementation, the ADSP-lllOA and all miscellaneous logic are 
used in conjunction with an external pipeline latch. The pipeline 
latch guarantees that the microcode bits controlling the circuitry 
are vaHd for a complete cycle (see timing diagram below). Note 
that the ADSP Word-Slice™ components (the ADSP-1401 and 
ADSP-1410) contain an internal pipeline register and are fed 
directly from the microcode. 










APPENDIX D 


; THIS DEF. FILE IS FOR THE MICROCODED SYSTEM i 
TITLE MICROSYSTEM 
DERM_FILE "SMACRO. DAT" 

WORD WIDTH 49 BITS 
I1ATA_L£)C_WIDTH 16 
PROBRaMJJOC_WIDTH 49 
!l 

#sequencer: FIELD 42-48, WIDTH 7, DEFAULT cont 
CPOXE.BITJCnrATIONS C 
kk ( ;Cdnditions 

DO uncond i t ional 
01 notflag 

10 flag 

11 sign 
) 

cc ( j Selects the relevent register tR3-R0)and 
(X>-C3 j/or counter (C3-C01. 

) 

ii ( ^Decides nurrber to be added incase of 
10-13 ? AIRSP instruction. 

) 

3 

INSTR_NOT_AVAILABLE C 
jrc(sign) (cO) 
jrc(sign) (cl) 
jrc(sign) (c2) 
jrc(sign) (c3) 
branch (sign) (cO) 
branch(sign) (cl) 
branch(sign) (c2) 
branch (sign) (c3) 

3 



BRANaH_INBTOJ«ICH_NEED_DATA L 

ABSOLUTE ( 
jsa 
Jda 
jdrst 

) 

FELATIV^ ( 
jsr 
jdr 
) 

3 

VALUES C 

j jurrp 8c branch instruction 

jpcof 0010101 
jpcnf 0110101 
jtwo lOlkkOl 
jda lllkkll 
jdr lllkkOl 
jdi lOlkklO 
jdrst 1001 Icc 
jrs 1101 Icc 
jsa lllkkOO 
jsr lllkklO 
rtn lOlkkll 
branch lOOkkcc 


; stack operation 
}si±)routine stack 

psdss 0011110 
ppssd 0111110 
wrssp 0001110 
rdssp 0101100 
dssp 0000010 

j register stack 
sgsp 0000111 


Slsp 0000110 
rdrsp 0101111 
wrrsp 0001100 
pspc 0100011 
psgsp 0000101 
ppgsp 0000100 
psdrs 0011111 
pprsd 0111111 
airsp OlOlOii 
sirsp 0001111 
s4rsp 0111100 


jstatus register cperat ic»is: 
rdsr 0101110 
wrsr 0011100 
pssr 0100001 
ppsr 0100010 


; counter operaticHTSs 
wrcntr OlllOcc 
clrs 0010100 
sets 0110100 
pscntr OOOlOcc 
ppcntr OOllOcc 
dccntr OllOOcc 
ifcdec lOlkkOO 

j Interrupt control: 
ccir 0010001 
cair 0000001 
rtnir 0000011 
rdiv 0101101 
wriv 0001101 
irmbc 0010011 
irntos 0010010 
disir 0010110 
enair 0110110 
slir 0010111 
stir 0110111 



sir i VP 0011101 


; relative address width controls: 
rell6 OlOOlOO 
rell2 0100111 
rel8 0100110 

; miscellaneous instructions: 
cont OOOOOOO 
idle 0010000 
ihc 0100101 
wcs 0100000 


1 

?2 

#data: FIELD 26-41, WIDTH 16, DEFAULT zero 

VALUES C 

zero 0000000000000000 
1 

J3 

#DATA_ENABLE:FIELD 25-25,WIDTH 1, DEFAULT disable 
VALUES C 
enable 1 
disable 0 
3 

}4 

#address_generator: FIELD 15-24, WIDTH 10, DEFAULT ncp 

OPCX]DE_BITJMOTATIONS I 
cc ( jConparison Register nurrber 
c0-c3 
) 

rrr ( jThree bit Address register nurrber 
R0-R7 
) 

rrrr ( jFour bit Address register nunrber 



r0-rl5 

) 

bb ( jBase (offset) register nurriaer 
bOH33 
) 

ii ( ? Initialisation register nurrber 

i0-i3 
) 

PP ( ;Twq bit precision code 
p0-p3 
) 

X ( jOne bit control bit 
xO-xl 
) 

3 

VifiLUES C 

jlcnping instructions 

yinc lOllccrrrr 
ydec lOlOccrrrr 
yadd llccbblrrr 
ysub llccbbOrrr 

? register transfer instructions: 
yrtr OOOlOlrrrr 
yrtb OOllbbrrrr 
yrtc OOlOccrrrr 
dti OOOOllllii 
itr lOOOiirrrr 
btr OlOObbrrrr 
rtd OOOlOOrrrr 
ctd 00001 lOOcc 
btd OOOOllOlbb 
itd OOOOlllOii 

; logical and shift instructions 
yor Olllbbrrrr 
yand 01 lObbrrrr 


rO-r 15 
) 

bb ( jBase (offset) register number 
b0433 
) 

ii ( ; Initialisation register nurrtoer 

i0-i3 
) 

F>p ( sTwo bit precision code 
p0-p3 
) 

X ( jOne bit control bit 
xO-xl 
) 

1 

\fmJES C 

; looping instructions 

yinc lOllccrrrr 
ydec lOlOccrrrr 
yadd llccbblrrr 
ysub llccbbOrrr 

; register transfer instructions; 
yrtr CXX)101rrrr 
yrtb OOllbbrrrr 
yrtc OOlOccrrrr 
dti OCXXlllUii 
itr lOOOiirrrr 
btr OlOCJbbrrrr 
rtd OOOlOOrrrr 
ctd OOOOUOOcc 
btd OOOOllOlbb 
itd 000011 lOii 

; logical and shift instructions 





yxor OlOlbbrrrr 
yasr OOOlllrrrr 
ylsl OOOllOrrrr 


rst OOOOCXXXX)! 
dtcr 0000101110 
crtd 0000101111 
seti OOOOlOOiix 
setp OOOOlOlOpp 
sety OOOOOlOOlx 
selr 000001 lOlx 
selb 000001 lOOx 
setu OOOOOlOllx 
seta OOOOOlOlOx 

wra 0000101100 
rda 0000101101 
Ida 0000011110 

ydty 0000011111 
yrev lOOlbbrrrr 
nop 0000000000 
3 

f5 

#ag_gen _pin: FIELD 14-14, WIDTH 1,DEFAU.T nodsel 


VALUES C 
ncxisel 0 
dsel 1 
3 

#ag_air_select: FIELD 13-13, width 1, default noair 
V^UEB C 
air 1 
noair 0 



VALUES 


C 

rd 11 

wr 10 

norw 00 
2 

;8 

^hnacsFIELD 5-10,WIDTH 6,default ncp 
VALUES L 


nap 

000000 


ckfTir 

000100 


x=bus 

001000 


lS=t)US 

010000 


ms=t)us 

010100 


ex=t)us 

010010 


lss=ms 

010001 


ms!=ex 

010101 


ex=signextiTis 010011 


ms=signextls 010111 


bus=ex 

001101 


bus=ex(.sl) 

001100 


bus^rrs 

011001 


bus=iTis(sl) 

011000 


bus=frisCsat) 

011011 


bus==ms<.slsat ) 011010 


bus?=ls 

011101 


bus!=ls(sl) 

011100 


bus?=ls(sat) 

011111 


bus?=ls(slsat) 011110 


y=bus_cknrr_ 

xuswyus 

100000 

y=bus_ckmr_ 

.-xusstyus 

100001 

y=bus_ckmr_ 

xuswyus+fTir 

100010 

y=bus_ckirr_ 

-xus»yus+mr 

100011 

i 

f 

xus»yus-(ir 

100110 

y=bus_ckirr_ 

.-xus«yus-nr 

100111 

y=bus_ckrrr_ 

xtcsyus 

101000 

y=bus_ckmr_ 

.-xtcwyus 

101001 



y=Dus_ckiTr_-xtc«yus-HTir 

101011 

y=bus_cktTir_xtc»yus-rrir 

101110 

y=bus_cknrir_->{tc»yus-mr 

101111 

y=t)us_ckiTr_>:us«ytc 

110000 

y=t)us_ckiw_-xus»ytc 

110001 

y=tjus_ckmr_xus9fytc+iw 

110010 

y==bus_ckrrir_-xus»ytc+iTir 

110011 

y=t)us_cknr _xus«y tc -inr 

110110 

y=bus_ckrw _-xuc»y tc -trir 

110111 

y=t)usj:kiTir_xtc»ytc 

111000 

y==bus_ckmr_-xtcsytc 

111001 

y=tou5_ckiTr_xtc»ytc+mr 

111010 

y=t)us_ckmr_-xtcKytc-HTir 

111011 

y=busj:kmr _xtc»ytc -mr 

111110 

y==bus_ckmr _-xtc«y t c -mr 

mill 


1 

J9 

4inBc_rnd _pins: FIELD 3-4, WIDTH 2, DEFAULT nornd 
VALUES C 
rndl4 01 
rndl5 10 
nornd 00 
3 

;10 

44iTiac_controls FIELD 2-2,WIDTH 1, DEFAULT macdisable 
VALUES C 
rrtacenable 1 
macdisable 0 
3 

?H 

#latch_control: FIELD 0-1, WIDTH 2, DEFAULT norwlatch 
VALUES C 
rd latch 11 
wr latch 10 
norwlatch 00 
3 



; Object file fcr 1-D ccmvoluticsn 


«h 

d0050 0001 
d0051 0002 
d0052 0003 
d00S3 0004 
d0054 0005 
POOSO 000001421E0005 
pOOSl 00000402988000 
POOS2 00000542990000 
p0053 00000002000203 
P0054 00300006000000 
P0055 0078016AOOOOOO 
P0056 OOE40CMAOOOOOO 
P0057 OOEOOOOFOOOOOO 
P0058 00000001601900 
p0059 010001634S9C40 
POOSA OOC4001B9C8000 
P005B 00000002000403 
POOSC 00000000000000 
POOSD 0174000171 13A4 
POOSE 01C0015EOOOOOO 


jObject file fcr MATRIX rriultiplication 

pOlOO 00000142604000 
POIOI 000004021EaOOO 
p0102 00000542990000 
p0103 00000012684000 
p0104 00000002000203 

pOlOS 00300006000000 



pOlOe O0E40O130Qa0OO 
P0109 CXDEOOOOABOOOOO 
pOlOA O0CXXXX)16O190O 
POICB OlO0O42B6a9O«IO 
POIOC 000cx)000000000 
POIOD OOD«X)02000403 
pOlOE 000000CX)0CXX)00 
POICF 0174000171 13A4 
pOllO 01CCX)426000000 
pOlll OOC80000Q28000 
P0112 O174O0OiaEB000 
P0113 01CCX)422628000 



APPENDIX E 


J TWIS DEF. FILE IS FC3R THE MICROCODED SYSTEM 2 
TITLE MICROSYSTEM 
DEFN.FILE "SMACROi . DAT" 

WORD WIDTH 60 BITS 
DATA_LOC_WIDTH 16 
PROGRAM J_OC_WIDTH 60 
fl 

♦sequencer; FIELD 53-S9, WIDTH 7, DEFAULT cont 
OPCODE_BIT_NOTATICNS C 
kk ( fCdnditions 

00 unconditicmal 

01 notflag 

10 flag 

11 sign 
) 

cc ( jSelects the relevant register (R3-R0) and 

C0-C3 f /or counter (C3-C0) . 

) 

ii ( ; Dec ides nurrber to be added incase of 

10-13 ; AIRSP instruction. 

) 

3 

INSTRJM[]T_AVAILAELE C 
jrc(sign) (cO) 
jrctsign) (cl) 
jrc(sign) (c2) 
jrc(sign) (c3) 
branch (sign) (cO) 
branch (sign) (cl ) 
branch (sign) (c2) 
branch (sign) (c3) 

3 

BRANI>^_INBTRJ4^Ia^_^EEDJDATA L 
ABSOLUTE ( 


jsa 



jda 

jdrst 

> 

RELATI\^ ( 
jsr 
jdr 
) 

3 

VALUES C 

j jump & branch instruction 


jpcof 0010101 
jpcnf 0110101 
jtwo lOlkl^jOl 
jda lllkkll 
jdr lllkkOl 
jdi lOlkklO 
jdrst 1001 Icc 
jrs 1 101 Icc 
jsa lllkkOO 
jsr lllkklO 
rtn lOlkkll 
branch lOOkkcc 


; stack operation 
; subroutine stark 

psdss 0011110 
ppssd OlllllO 
wrssp 0001110 
rdssp 0101100 
dssp 0000010 


; register stack 
sgsp 0000111 
slsp 0000110 
rdrsp 0101111 
wrrsp 0001100 
pspc 010001 1 



psgsp 0CXX)101 
ppgsp 0000100 
psdrs 0011111 
pprsd 0111111 
airsp OlOlOii 
sirsp 0001111 
s4rsp 0111100 

; status register cperations: 
rdsr 0101110 
wrsr 0011100 
pssr 0100001 
ppsr 0100010 

; counter operations; 
wrcntr OlllOcc 
clrs 0010100 
sets 0110100 
pscntr OOOlOcc 
ppcntr OOllOcc 
dccntr OllOOcc 
ifcdec lOlkkOO 

; Interrupt control; 
ccir 0010001 
cair OOOOOOl 
rtnir 0000011 
rdiv 0101101 
wriv 0001101 
iruibc 0010011 
iriTibs 0010010 
disir 0010110 
enair 0110110 
slir 0010111 
stir 0110111 
slrivp 0011101 

; relative address width controls 
rell6 0100100 



rell2 0100111 
rel8 0100110 

fmiscellanHous instruct icMiss 
ccint 0000000 
idle 0010000 
ihc 0100101 
wcs 0100000 

] 

#data: FIELD 37-52, WIDTH 16,DEF/!O.T zero 
VALUES C 

zero OOOOOOOOOOOOOOOO 
3 

J3 

#DATA_EMiffiL£; FIELD 36-36, WIDTH 1, DEFAULT disable 
VALUES C 
enable 1 
disable 0 
3 

J4 

ttaddressjgenerattys FIELD 26-35, WIDTH 10, DEFAULT nop 

OPCODE.BITJJIJrATIONS C 
cc ( yConparison Register nurrber 
c0-c3 
) 

rrr ( ; Three bit Address register number 

R0-R7 
) 

rrrr ( jFour bit Address register number 
r0-rl5 
) 

bb ( fBase (offset) register number 
b0-b3 



ii ( ; Initial isaticDn register nufrber 

i0-i3 
) 

pp ( ;Two bit precision code 
p0"p3 
) 

X ( jOne bit control bit 
xO-xl 
) 

1 

VmJES C 

? looping instructions 

yinc lOllccrrrr 
ydec lOlOccrrrr 
yadd llccbblrrr 
ysub i IccbbOrrr 

; register transfer instructions; 
yrtr OOOlOlrrrr 
yrtb OOllbbrrrr 
yrtc OOlOccrrrr 
dti OOOOllllii 
itr lOOOiirrrr 
btr OlOObbrrrr 
rtd OOOlOOrrrr 
ctd OOOOllOOcc 
btd CXXX)1101bb 
itd 00001 llOii 

? logical and shift instructions 
yor 01 1 Ibbrrrr 
yand OllObbrrrr 
yxor OlOlbbrrrr 
yasr OOOlllrrrr 
ylsl OOOllOrrrr 



rst OOOOCmX)! 
dtcr 0000101110 
crtd 0000101111 
seti OOOOlOOiix 
setp 00001010f>p 
sety OOOOOlOOlx 
selr OOOOOllOlx 
selb 000001 lOOx 
setu OOOOOlOllx 
seta OOOOOlOlOx 

wra 0000101100 
rda 0000101101 
Ida 0000011110 

ydty 0000011111 
yrev lOOlbbrrrr 
nop 0000000000 
3 

J5 

#ag_genj3in: FIELD 25-25,WIDTH 1,DEF(‘^ULT nodsel 

V(«A_UEB t 
ncDdsel 0 
dsel 1 
1 

#ag_air_select; FIELD 24-24,width 1, default noair 
mUES I 
air 1 
noair 0 
1 

?7 

#data_mem_ccmtrol : FIELD 22-23, WIDTH 2,default rrcrw 
VALUES 
I 

rd 11 



y=busj:krrr_-xus«ytc 110001 

y=busj:kiTir_xussytc+nr 110010 

y=t)us_cktTr_-xus»ytc+fnr 110011 

y=t)us_ckiTr_xussytc-tTr 110110 

y=tous_ckmr _-xuc»y tc -mr 1 101 1 1 

yNausjckiTiy _xtc»y tc 1 1 1000 

y=t>us_ckiTir_-xtc-itytc 111001 

y=bus_ckiTrr_xtc«ytc-HTr 111010 

y=bus_ckmr_-xtc»ytc+iTr 111011 

y=bus_cknrr _xtc»y tc -iw 111110 

y==t3us_ckrrfl'_-xtc»ytc-inir 111111 

3 
?9 


«Triac_rnd_pins: FIELD 14-15, WIDTH 2, DEFAULT ncDrnd 
VALUES C 


rndl4 

01 

rndl5 

10 

nornd 

00 


3 

10 



#maclsFIELD 8-13,WIDTH 6,default ncpl 
VALUES C 

nopl 000000 

ckmrl 000100 

x==busl 001000 

ls=busl 010000 

tTis=busl 010100 

Ex=t)usl 010010 

ls=?risll 010001 

ms=exl 010101 

ex=signextfrtsl 01001 1 

ms=signextlsl 010111 

busi=exl 001101 

buss=!exCsl)l 001100 

bus?=TTisl 011001 

bus=msCsl)l 011000 

bus=rris(sat)l 011011 

bus!=rris(slsat)l 011010 



wr 10 
ncsrw 00 
] 

J8 

#mac! FIELD 16-21, WIDTH 6, default ncp 
V^iLUES C 

nop 000000 

ckrrir 000100 

x=bus 001000 

ls=bus 010000 

tTis=bus 010100 

ex=bu5 010010 

ls=Trts 010001 

ms=ex 010101 

ex=signextms 010011 
m5=signextls 010111 
bus=ex 001 101 

bus=ex(sl) 001100 

buts=rrs 011001 

bus=ms(sl) 011000 

buss=ms(sat) 011011 

bus!=rre(slsat) 011010 
bus=Is 011101 

bus=ls(sl) 011100 

bus=ls(sat) 011111 

bus=ls(slsat) 011110 


y=bus_ckn'ir_xus»yus 100000 

y=t)us_cktw_-xusKyus 100001 

y=t)us_ckmr_xus»yus+irr 100010 

y=bus_ckrrr_-xus»yus+rr(r 100011 

y=bits_ckmr _xusKyus-tw 100110 

y=bus_cktTir_-xus»yus-niir 1001 1 1 

y=bus_ckirir_xtc«yus 101000 

y=bus_ci-OTr_-xtc»yus 101001 

y=tous_ckmr_xtc«yus-Hnr 101010 

y=bus_cknr_-xtcKyus+fw 101011 

y=t)us_ckiTir_xtc»yus-trr 101 1 10 

y=tous_cktTr_-xtc»yus-fir 101111 

y=bus_ckiTir_xus»ytc 110000 



bus=lsl 011101 


bus^ls(sDl 011100 


bus=lstsat)l 011111 


bus=ls(slsat)l 011110 


y=bus_ckrr(r_xussyusl 

100000 

y=bus_ckiTr_-xus»yusl 

100001 

y=t)us_ckiTir _xus» y us+trr 1 

100010 

y==bus_cktTr _-xus»yus+iTr 1 

100011 

y=bus_ckrrir _xuswyus-iTr 1 

100110 

y=bus_ckiTr _-xus«yus-frr 1 

100111 

y=fous_ckiTr_xtcsyusl 

101000 

y=bus_ckmr_-xtc»yusl 

101001 

y=bus_ckrrr _xtc»yus+mr 1 

101010 

y=bus_ckiTr_-xtc«yus-HTir 1 

101011 

y=bus_ckmr _xtcsyus-iTr 1 

101110 

y=bus_ckmr_-xtc»yus-frr 1 

101111 

y=bus_ckiTir_xus»ytcl 

110000 

y=bus_ckmr _-xus«y tc 1 

110001 

y=t}us_ckrrrr_xuswytc+iTr 1 

110010 

y=bus_cktrr _-xuss«ytc-HTr 1 

110011 

y=bus_ckmr _xus»y tc -mr 1 

110110 

y=bus_ckmr _-xuc»ytc -mr 1 

110111 

y=bus_ckiTr _xtc»ytc 1 

111000 

y=bus_cktw _-xtc»y tc 1 

111001 

y=bus_cktrr _xtc»ytc-HTir 1 

111010 

y=bus_ckmr _-xtc«ytc-HTr 1 

111011 

y=bus_ckrrr _xtc»ytc-iiir 1 

111110 

y=bus_ckmr _-xtc«y tc -rrr 1 

mill 


] 

«mac_rnd_pinsl: FIELD 6-7,WID-TH 2,DEFAULT norndl 

V^UEB C 
rndl41 01 

rndlSl 10 

norndl 00 
1 

?12 

#mac controls FIELD 5-5r WIDTH l,DEFAy_T macdisable 



VALUES C 
macenable 1 
macdisable 0 
3 

jl3 

lhTiacl_control! FIELD 4-4,WID7H 1, DEFAULT rr^ldisable 
VALUES C 
maclenable 1 
macldisable 0 
3 

?14 

#latch_control! FIELD 2-3, WIDTH 2, DEFAULT ncsrwlatch 
VfiLkES C 
rdlatch 1 1 
wr latch 10 

norwlatch 00 
3 

;15 

#latchl_control: FIELD 0-1, WIDTH 2,DEFAULT ncrwlatchl 
VALUES C 
rdlatchl 11 
wr latch 1 10 

ncrwlatchl 00 
3 


jObjBct file for 1-D convoluton 


p0050 OOOOOAIOFOOOOOOO 
P0051 O0O00A9302O000OO 
P0052 OOOOIFBCBOOOOOO 
p0053 OOOQ24B4DOOOOOOO 
p0054 0000401400000000 
P0055 00003FF4DCX)00000 
P0056 OOOOOOIOOOIOIOOF 



P0057 0180003000000CXX) 
P0058 OSCXXDCaOOOOOCJOOO 
P0039 O3O0ODD0OCXXXXXX) 
P005A O76CXX)D00CX)O0CX)0 
P005B 0740005414000000 
POOSC O7O0O05B0OO0OOOO 
POOSD OOOOOOOBOOC80000 
P005E O0O0OO0A1OE2O0O0 
POOSF 0000000848000003 
P0060 08000BBA14C02203 
P0061 O640OOECF0OO0OO0 
P0062 000000300004080F 
P0063 00000000001D2223 
P0064 0000000000000000 
p0065 000000141420200F 
P0066 0000007CC8000000 
p0067 OBAOOOOBD08O1D12 
p0068 OE600&9000000000 
P0069 OOOOOOOAO0OOOOOO 
P006A 0720007800000000 
P006B OOOOOOOBOOC80e03 
P006C 0000000A10e20000 
P006D 0S200D7A0eC»2203 
P006E 066000DCFOOOOOOO 
p006F 000000100020200F 
P0070 0000009CE8000000 
P0071 OOOOOOOB189D0020 
p0072 OBA0000A1CB01D12 
p0073 OE600D5000000000 


jODject file for MATRIX multiplication 
$h 

pOOSO OOOOOA1302000001 
p0051 00002010F4000000 
p0052 00002090F8000000 
P00S3 OOOQ2A14aDOOOOOO 
pOOS4 OOOOOOIOOOIOIOOF 
p0055 0180003000000000 



P0056 03CX)OBDOOOOOOOOO 
P0057 0740005000000000 
P0058 0720003844000000 
P0059 0000000888000000 
pOOSA 0700005400000000 
P005B 0000000800080003 
P005C O00000OB44E2000O 
P005D O0OOOB7B88CO22O3 
POOSE 0000000000000000 
p005F 000000100020200F 
p0060 0000009EE4000000 
P0061 OOOOOOC®8C9D0020 
P0062 O62OOOOB8C0O1D12 
P0063 OBA0009EE3000000 
p0064 OE600B5000000000 
P0065 0640000414000000 
P0066 OBA0009CF4000000 
P0067 OE600Bi314000000 



APPENDIX F 


; THIS DEF. FILE IS FGR TVE MICROCODED SYSTEM 3 
TITLE MICROSYSTEM 
DEFN_FIL£ "SMACR02.DAT'' 

WOTD WIDTH 74 BITS 
mTA_LOC_WIDTH 16 
PROGRAMJJOCJJIDTH 74 
;l 

#sequencersFIELD 67-73, WIDTH 7, DEFAULT cont 
OPCODE_BIT_NDTATIONS C 
kk ( ; Conditions 

00 uncondi t ional 

01 not flag 

10 flag 

11 sign 
) 

cc ( ; Selects the relevent register tR34T0) and 

C0-C3 ; /or counter (C3-C0) . 

) 

ii ( ; Decides number to be added incase of 

10-13 j AIRS^* instriction. 

) 

] 

INSTR_IMDT_AVAILAELE C 
jrc (sign) (cO) 
jrc(sign) (cl) 
jrc (sign) (c2) 
jrc (sign) (c3) 
branch (sign) (cO) 
br anch (sign ) (c 1 ) 
br anch (sign ) (c2) 
branch (sign) (c3) 

1 

ERAN(>iJhBTRJ4HICHJsEEDJDATA C 


ABSOLt/TE ( 



Jda 

jdrst 

) 

RELATIVE ( 

jsr 

jdr 


) 

3 

V/51.UES C 

; juiTft & branch instruction 


jpcof 0010101 

jpcnf 0110101 

jtwo 

lOlkkOl 

Jda 

lllkkll 

jdr 

lllkkOl 

jdi 

lOlkklO 

jdrst 

1001 Icc 

jrs 

llOllcc 

jsa 

lllkkOO 

jsr 

lllkklO 

rtn 

lOlkkll 

branch lOOkkcc 


j stack operation 
; subroutine stack 

psdss 0011110 
ppssd 0111110 
wrssp 0001110 
rdssp 0101100 
dssp 0000010 

j register stack 
sgsp 0000111 
slsp 0000110 
rdrsp 0101111 
wrrsp 0001100 
pspc 0100011 



psgsp 0000101 
ppgsp 0000100 
psdrs 0011111 
pprsd 0111111 
airsp OlOlOii 
sirsp 0001111 
s4rsp 0111100 


f status register aperaticarjs: 
rdsr 0101110 
wrsr 0011100 
pssr 0100001 
ppsr 0100010 

j counter operations: 
wrcntr OlllOcc 
clrs 0010100 
sets 0110100 
pscntr OOOlOcc 
ppcntr OOllOcc 
dccntr OllOOcc 
ifcdec lOlkkOO 

f Interrupt control; 
ccir 0010001 
cair 0000001 
rtnir 0000011 
rdiv 0101101 
wriv 0001101 
irmbc 0010011 
irrrbs 0010010 
disir 0010110 
enair 0110110 
slir 0010111 
stir 0110111 
slrivp 0011101 

? relative acWress width ccsntrols 
rell6 0100100 



reU2 0100111 
relS 0100110 

j miscellaneous instructicwis: 


cont 

0000000 

idle 

0010000 

ihc 

0100101 

wcs 

0100000 

3 


2 



#data: FIELD 51-^, WIDTH 16, DEFAULT zero 

VALUES C 

zero 0000000000000000 
3 

53 

#DATA_ENABL£: FIELD 50-50, WIDTH 1, DEFAULT disable 
VALUES E 
enable 1 
disable 0 
3 

?4 

#address_jgenerator : FIELD W-49, WIDTH 10,DEF*^JLT nop 

OP(33DE_BIT_NOTATIGNS C 
cc ( ; Cdrrpar ison Register nurrber 

c0-c3 
) 

rrr ( jThree bit Address register nurrber 
R0-R7 
) 

rrrr ( ?Four bit Address register nuirber 
r0-rl5 
) 

bb ( jBase (offset) register nurrber 
bOHaS 



ii ( ; Initialisation register nufrtoer 

i0-i3 
) 

pp C ;Two bit precision code 
p0-p3 

) 

X ( ;One bit control bit 
xO-xl 
) 

3 

V/5LUEB C 

j looping instructions 

yinc 101 leer rrr 

ydec lOlOccrrrr 
yadd llccbblrrr 
ysub 1 IccbbOrrr 

^register transfer instructions: 
yrtr OOOlOlrrrr 
yrtb OOllbbrrrr 
yrtc OOlOccrrrr 
dti OOOOllllii 
itr lOOOiirrrr 
btr OlOObbrrrr 
rtd OOOlOOrrrr 
ctd 00001 lOOcc 
btd 00001 lOlbb 
itd 00001 llOii 

j logical and shift instructions 
yor Olllbbrrrr 
yand OllObbrrrr 
yxor OlOlbbrrrr 
yasr 0001 1 Irrrr 
ylsl OOOllOrrrr 



rst OOOOOOOOOl 
dtcr 0000101110 
cr td 0000101111 
seti OOOOlOOiix 
setp OOOOlOlOpp 
sety OOOOOlOOlx 
selr OOOOOllOlx 
selb 000001 lOOx 
setu OOOOOlOllx 
seta OOOOOlOlOx 

wra 0000101100 
rda 0000101101 
Ida 0000011110 

ydty 0000011111 
yrev lOOlbbrrrr 

TOP 0000000000 

] 

;5 

#ag_gen_pin; FIELD 39-39, WIDTH 1, DEFAULT ncxisel 

VALUES C 
ncxisel 0 
dsel 1 
1 
rii 

#ag_air_select; FIELD 38-38, width 1, default noair 
VALUES C 
air 1 
nc3air 0 
3 

;7 

#data_rreiTi_ccr>trol: FIELD 36-37, WIDTH 2, default norw 
V/il.UEB 
C 

rd 11 



wr 10 
ncrw 00 
3 

;8 

#mac:FIEUD 30-35, WIDTH 6, default nap 
V/5LUES C 

nap 000000 

ckiTir 000100 

x=bus 001000 

ls=bus 010000 

iTis=bus 010100 

ex=tius 010010 

ls=iTis 010001 

rTis=ex 010101 

ex=signextrTS 010011 

rrts=signextls 010111 

bus=ex 001 101 

bus=ex(sl) 001100 

bus=iTe 011001 

bus=rTs(sl) 011000 

bus=Trs(sat) 011011 

bus=Tie(slsat) 011010 
bus=ls 011101 

bus=lsCsl) 011100 

bus=ls(sat) 011111 

bus=ls(slsat) 011110 


y=bus_ckiTr_xusKyus 100000 

y=t)us_ckiTr_-xus«yus 100001 

y=t)us_cknir_xus»yus+iTr 100010 

y=bus_ckn-r_-xus«yus+rrr 100011 

y=bus_ckrrr_xussyus-fT(r 100110 

y =t) us_c kmr _-x ussy us -rrr 100111 

y=t)us_ckmr_xtcsyus 101000 

y=bus_ckrTir_-xtc»yus 101001 

y=bus_ckrrr_xtc»yus-HTr 101010 

y=t>us_ckmr_-xtc»yus+rrr 101011 

y=t)us_ck[Tr_xtc»yus-iTir 101110 

y=t)us_ckmr_-xtcsyus-frr 101111 

y=bus_ckmr_xussytc 110000 



y=bus_ckrrr_xus«ytc-i-mr IIODIO 

y=bus_ckrTir_-xus«ytc+rrrr 110011 

y=t)us_ckn'ir_xus»ytc-tTir 110110 

y=bus_ckfTr _-xuc»ytc -irr 1 101 1 1 

y=t)us_ckrrr_xtcsytc 111000 

y=tDUs_ckmr_-xtc»ytc 111001 

y=bus_ckrr(r_xtc»ytc+rrir 111010 

y=bus_ckrTr_-xtc»ytc-HTir 111011 

y=t3us_cktTtr_xtc«ytc-Trr 111110 

y=bus_ckmr_-xtc»ytc-frir 111111 

3 


?9 

#n-ac_rnd _pins: FIELD 28-29, WIDTH 2, DEFAULT nornd 
VALUES C 
rndl4 01 

rndl5 10 

ncjrnd 00 

1 

-rlO 

#address_generator 1 ; FIELD 18-27, WIDTH 10,DEFALLT nopl 

0PCaDE_BIT_N0TATI0NS C 

cc ( jCofTparison Register nuirber 
c0-c3 
) 

rrr ( j Three bit Address register nurrber 
R0-R7 
) 

rrrr ( jFour bit Address register nurrber 
r0-rl5 
) 

bb ( jBase (offset) register nurrber 
b0-b3 
) 

ii ( ; Initialisation register nurrber 

i0-i3 
) 



F>p C ;Two bit precision code 
p0-p3 
) 

X ( ;One bit control bit 
xO-xl 
) 

3 

VALUES [ 

; looping instructions 

yincl lOllccrrrr 
ydecl lOlOccrrrr 
yaddl llccbblrrr 
ysubl llccbbOrrr 

; register transfer instructions 


yrtr 1 

OOOlOlrrrr 

yrtbl 

OOllbbrrrr 

yrtcl 

OOlOccrrrr 

dtil 

OOOOllllii 

itrl 

lOOOiirrrr 

btrl 

OlOObbrrrr 

rtdl 

OOOlOOrrrr 

ctdl 

00001 lOOcc 

btdl 

00001 lOlbb 

itdl 

OOOOlllOii 


; logical and shift instructions 
yorl Olllbbrrrr 
yandl OllObbrrrr 
yxorl OlOlbbrrrr 
yasrl OOOlllrrrr 
ylsl OOOllOrrrr 

rstl OOOCXXXXX)! 
dtcrl 0000101110 
crtdl 0000101111 
setil OOOOlOOiix 



eetpl 

OOOOlOlOpp 

satyl 

OOOOOlOOlx 

selr 1 

OOOOOllOlx 

selbl 

000001 lOOx 

setul 

OOOOOlOllx 

setal 

OOOOOlOlOx 

wral 

0000101100 

rdal 

0000101101 

Idal 

0000011110 

ydtyl 

0000011111 

yrevl 

lOOlbbrrrr 

napl 

0000000000 


1 

Jll 

#agl_gen _pin: FIELD 17-17, WIDTH 1,DEF/5AJLT ncxlsell 


VALUES C 
nodsell 0 
dsell 1 
1 

;12 

#agl_air_select: FIELD 16-16, width 1, default noairl 

VALUES C 
air 1 1 
ncDairl 0 
1 

;13 

#datal_rTem_control: FIELD 14-15, WIDTH 2, default norwl 
VALUES 
I 

rdl 11 

wrl 10 

norwl 00 
1 


14 



tt(r^l;FIElJD 8-13,1^10™ 6,default nc 3 pl 
VALUES C 

nopl 000000 

ckrrrl 000100 

x=t3usl 001000 

ls=t)usl 010000 

iTfS=busl 010100 

ex=t)usl 010010 

ls=Trtsl 010001 

nis=€xl 010101 

ex=signextrr©l 010011 

ms=signextlsl 010111 

bus=exl 001101 

bus=5ex(sl)l 001100 

bus=iTel 011001 

btis=nris(sl)l 011000 

bus=ms(sat)l 011011 

bus=iTS(slsat)l 011010 
bus=lsl 011101 

bus=ls(sl)l 011100 

bus=ls(sat)l 011111 

bus=ls(slsat)l 011110 


y=t)us_ckiTir_xus»yusl 100000 

y=tous_ckiTir_-xussyusl 100001 

y=bus_ckfrir_xus»yus+mr 1 100010 

y=t)us_ckrrr _-xuswyLis+iTr 1 10001 1 

y=bus_ckn’r_xus»yus-iTir 1 100110 

y=bus_ctaTir_-xus»yus-iTr 1 100111 

y=bus_ckrrr_xtc»yusl 101000 

y=t>us_ckmr_-xtc»yusl 101001 

y=bus_ckmr_xtc»yus-HTirl 101010 

y=bus_ckrrr_-xtc»yus-Hrrr 1 101011 

y=bus_ckrrrr_xtc»yus-iTrl 101110 

y=t>us_ckn'ir_-xtc»yus-trr 1 101111 

y=bus_ckmr _xus»y tc 1 1 10000 

y=bus_ckiTir _-xus«y tc 1 1 10001 

y=bus_ckiTr_xussytc+Trr 1 110010 

y=bus_ckrrr_->;us«ytc-HTr 1 110011 

y=t)us_c^aTlr_xus«ytc-Trlr 1 110110 



yMDus_ckiTr_-xuc«ytc-mrl 110111 

y=bus_ckrTr_xtc«ytcl 111000 

y=bus_ckrTr_-xtc«ytcl 111001 

y=bus_ckiTr _xtc«y tc+frr 1 11 1010 

y=t»us_ckiTr _-xtc«ytc-HTir 1 11 101 1 

y=t)us_ckrTr_xtc»ytc-rrr 1 111110 

y=t)us_ckiTir_-xtc»ytc-fiirl 111111 

3 

;15 


#macl_rnd_pins: FIELD 6-7, WIDTH 2,DeFAULT norndl 
VALUES C 
rndl41 01 

rndlSl 10 

ncjrndl 00 

3 

;16 

#latch_cc3ntrol: FIELD 4-5, WIDTH 2, DEFAULT ncrwlatch 
VALUES C 
rdlatch 11 
wr latch 10 
ncrwlatch 00 
3 

?17 

#latchl_control; FIELD 2-3, WIDTH 2, DEFAULT norwlatchl 
V/^UES C 
rd latch 1 11 

wr latch 1 10 

norwlatchl 00 
3 

;18 

ttiTsc .control: FIELD 1-1, WIDTH 1, DEFAULT rracdisable 
VALUES C 
macenable 1 
rracdisabls 0 
3 

;19 

#nvacl .control: FIELD 0-0,WIDTH 1, DEFAULT rnacldisable 
VALUES C 
maclenable 1 



macldisable 0 


1 


jObject file fcr MATRIX triultiplicatican 


$h 

pOlOO OCXX>D2»?OD0OO3O2OOC)O 
PGIOI 00(>D08043D000CF400(X) 

P0102 OOOOOABAOOCOCMCaOOCK) 

P0103 C<XXX)AB4<XXXX>4aX)CW 
p0104 00000CO400040000103C 
POIOS CXD60O0CXX00O0CXXXX)00 
p0106 (>3FOOe6COOOOOCXXXXXX) 

P0107 OCFCXDe94OO0CX)OC)0O0CX) 
pOlOe 01IX)0004000000C«C)000 
P0109 01(30CX2bll000e44OCXX) 
pOlOA 01C00015000CX>1000000 
P01C» 00(XXX)32CX3320EO0CBCO 
POIOC 02000e5EC1388B04E200 
POIOD 00000000000000000000 
pOlOE 00000004000B0000203C 
pOlOF OOOOOOOOOOOOOOOOOOOO 
pOllO OOOOOOOOOOO74B480O2E 
pOlll 020e08540000C©40?D01 
p0112 0O000034O0000CEB0000 
p01i3 0000003400000CECXXXX5 
P0114 02100e4CC00003000000 
$$ 

; Object file for 1-D convolution 

pOlOO OOOOO2843CX)0O0F0OOOO 
POIOI OOOOO2A4CO0OOOOOOOOO 
p0102 000007ED320000000000 
P0103 0000092C000004DOOOOO 



P0104 OOOOOAB4OOOOO4D0OOOO 
P0105 OOOCX3A7C»OCXXHD(XOC)0 
P0106 (XXXXXX>KXX)40000103C 
P0107 006<XXXXX<XXX)0000000 

poloe ooFooe7cxxmx)oc)oooo 

P0109 C)CFOCmDO(XXXXXXX>000 
pOlOA O1D0C)O34OOOOOOOOOOOO 
pOlOB 01D000140000CXXXXXXX) 
POIOC 010cxx)1500000e000000 
pOlOD 00000002C2320800C800 
pOlOE 02000e6EB03aaA10e2CX) 
pOlOF 0000001F320000000030 

poiio oooocxxmx)iooooo0oc 

pOlll 0CXXX>D0CXXX)7400C)2Z2E 

P0112 oocxxxmxxxxxpooooo 

P0113 01900004000GOCXXG03C 
p0114 0O00(X)0CO00CXXXXXXXX) 
p0115 02E80000000COB989D01 
P0116 03990e640000000(XXXX) 
P0117 000cxx)02820000000000 
POIIS O1C0OO1EOOOOO0OOOOOO 
P0119 (X)OC)CXX)2C0320EOOC800 
POIIA O2O0OeCE82388AlOE2OO 
pOllB 0CX300CX273AC)00CXXXX)00 
POIIC 0CXX)0004CXX»C)0C)0203C 
POIID 019a0CX34O0O00CFO000O 
POIIE 0000000000074A1CB02E 
POIIF 02E8000000000B189D01 
p0120 C)3980eDKXXXX>DOOC)OW 



